Stanford CS149, Fall 2025
PARALLEL COMPUTING
This page contains lecture slides and recommended readings for the Fall 2025 offering of CS149.
We cannot distribute lecture videos to the public this year, but videos from a prior version of the course (2023) are available on Stanford's Youtube Channel.
(Challenges of parallelizing code, motivations for parallel chips, processor basics)
(Forms of parallelism: multi-core, SIMD, and multi-threading)
(Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation)
(Process of parallelizing a program in data parallel and shared address space models)
(Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)
(Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
(Data-parallel operations like map, reduce, scan, prefix sum, groupByKey)
(Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion)
(Energy-efficient computing, motivation for and design of hardware accelerators. Case study on DNN accelerator design.)
(Modern trends and programming systems for creating specialized hardware)
(How modern AI applications are served at datacenter scale)
(Domain-specific programming abstractions for writing high-performance code, automatic program optimization, with a focus on optimization driven by AI agents)
(Invalidation-based coherence using MSI and MESI, false sharing)
(Fine-grained synchronization via locks, motivation for relaxed consistency, implications to programmers.)
(Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem)
(Motivation for transactions, design space of transactional memory implementations, STM and HTM basics)
(Suggestions for post cs149 topics. AMA with the course staff.)