Stanford CS149, Fall 2022

PARALLEL COMPUTING

This page contains lecture slides and recommended readings for the Fall 2022 offering of CS149.

Lecture 1: Why Parallelism? Why Efficiency?

(Challenges of parallelizing code, motivations for parallel chips, processor basics)

Lecture 2: A Modern Multi-Core Processor

(Forms of parallelism: multi-core, SIMD, and multi-threading)

Lecture 3: Multi-core Arch Part II + ISPC Programming Abstractions

(Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation)

Lecture 4: Parallel Programming Basics

(Ways of thinking about parallel programs, thought process of parallelizing a program in data parallel and shared address space models)

Lecture 5: Performance Optimization I: Work Distribution and Scheduling

(Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)

Lecture 6: Performance Optimization II: Locality, Communication, and Contention

(Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention)

Lecture 7: GPU architecture and CUDA Programming

(CUDA programming abstractions, and how they are implemented on modern GPUs)

Lecture 8: Data-Parallel Thinking

(Data-parallel operations like map, reduce, scan, prefix sum, groupByKey)

Lecture 9: Distributed Data-Parallel Computing Using Spark

(Producer-consumer locality, RDD abstraction, Spark implementation and scheduling)

Lecture 10: Efficiently Evaluating DNNs on GPUs

(Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion)

Lecture 11: Cache Coherence

(Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing)

Lecture 12: Memory Consistency

(Relaxed consistency models and their motivation, acquire/release semantics)

Lecture 13: Fine-Grained Synchronization and Lock-Free Programming

(Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers)

Lecture 14: Domain Specific Programming Languages

(Performance/productivity motivations for DSLs, case studies on several DSLs)

Lecture 15: Transactional Memory 1

(Motivation for transactions, design space of transactional memory implementations.)

Lecture 16: Transactional Memory 2

(Finishing up transactional memory focusing on implementations of STM and HTM.)

Lecture 17: Hardware Specialization

(Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs)

Lecture 18: Accessing Memory + Course Wrap Up

(How DRAM works, suggestions for post-cs149 topics)