From smart phones, to multi-core CPUs, to GPUs, to AI accelerators, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.
| Sep 23 |
|
|
Challenges of parallelizing code, motivations for parallel chips, processor basics
|
| Sep 25 |
|
|
Forms of parallelism: multi-core, SIMD, and multi-threading
|
| Sep 30 |
|
|
Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation
|
| Oct 02 |
|
|
Process of parallelizing a program in data parallel and shared address space models
|
| Oct 07 |
|
|
Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
|
| Oct 09 |
|
|
Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
|
| Oct 14 |
|
|
CUDA programming abstractions, and how they are implemented on modern GPUs
|
| Oct 16 |
|
|
Data-parallel operations like map, reduce, scan, prefix sum, groupByKey
|
| Oct 21 |
|
|
Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion
|
| Oct 23 |
|
|
Energy-efficient computing, motivation for and design of hardware accelerators. Case study on DNN accelerator design.
|
| Oct 28 |
|
|
Modern trends and programming systems for creating specialized hardware
|
| Oct 30 |
|
|
How modern AI applications are served at datacenter scale
|
| Nov 04 |
|
|
Democracy Day (no class)
Attend Stanford's many events!
|
| Nov 06 |
|
|
Domain-specific programming abstractions for writing high-performance code, automatic program optimization, with a focus on optimization driven by AI agents
|
| Nov 11 |
|
|
Invalidation-based coherence using MSI and MESI, false sharing
|
| Nov 13 |
|
|
Fine-grained synchronization via locks, motivation for relaxed consistency, implications to programmers.
|
| Nov 18 |
|
|
Midterm Exam (no class)
This will be an evening exam, so there's no class
|
| Nov 20 |
|
|
Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem
|
| Dec 02 |
|
|
Motivation for transactions, design space of transactional memory implementations, STM and HTM basics
|
| Dec 04 |
|
|
Suggestions for post cs149 topics. AMA with the course staff.
|
| Dec 11 |
|
|
Final Exam
Held from 3:30-6:30pm
|
We cannot distribute lecture videos to the public this year, but videos from a prior version of the course (2023) are available on Stanford's Youtube Channel.
| Oct 9 | Written Assignment 1 |
| Oct 21 | Written Assignment 2 |
| Nov 6 | Written Assignment 3 |
| Dec 3 | Written Assignment 4 |