Stanford CS149, Fall 2025
PARALLEL COMPUTING

From smart phones, to multi-core CPUs, to GPUs, to AI accelerators, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.

Basic Info
Time: Tues/Thurs 10:30-11:50am
Location: NVIDIA Auditorium
Instructors: Kayvon Fatahalian and Kunle Olukotun
See the course info page for more info on policies and logistics.
Fall 2025 Schedule
Sep 23
Challenges of parallelizing code, motivations for parallel chips, processor basics
Sep 25
Forms of parallelism: multi-core, SIMD, and multi-threading
Sep 30
Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation
Oct 02
Process of parallelizing a program in data parallel and shared address space models
Oct 07
Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
Oct 09
Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
Oct 14
CUDA programming abstractions, and how they are implemented on modern GPUs
Oct 16
Data-parallel operations like map, reduce, scan, prefix sum, groupByKey
Oct 21
Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion
Oct 23
Energy-efficient computing, motivation for and design of hardware accelerators. Case study on DNN accelerator design.
Oct 28
Modern trends and programming systems for creating specialized hardware
Oct 30
How modern AI applications are served at datacenter scale
Nov 04
Democracy Day (no class)
Attend Stanford's many events!
Nov 06
Domain-specific programming abstractions for writing high-performance code, automatic program optimization, with a focus on optimization driven by AI agents
Nov 11
Invalidation-based coherence using MSI and MESI, false sharing
Nov 13
Fine-grained synchronization via locks, motivation for relaxed consistency, implications to programmers.
Nov 18
Midterm Exam (no class)
This will be an evening exam, so there's no class
Nov 20
Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem
Dec 02
Motivation for transactions, design space of transactional memory implementations, STM and HTM basics
Dec 04
Suggestions for post cs149 topics. AMA with the course staff.
Dec 11
Final Exam
Held from 3:30-6:30pm
Lecture Videos

We cannot distribute lecture videos to the public this year, but videos from a prior version of the course (2023) are available on Stanford's Youtube Channel.

Programming Assignments
Oct 6 Assignment 1: Analyzing Parallel Program Performance on a Quad-Core CPU
Oct 16 Assignment 2: Scheduling Task Graphs on a Multi-Core CPU
Oct 30 Assignment 3: A Circle Renderer in CUDA
Nov 13 Assignment 4: Fused Conv+MaxPool on the Trainium2 Accelerator
Dec 4 Assignment 5: Make the World's Fastest CUDA Kernels
Written Assignments
Oct 9 Written Assignment 1
Oct 21 Written Assignment 2
Nov 6 Written Assignment 3
Dec 3 Written Assignment 4