Stanford CS149, Fall 2024
PARALLEL COMPUTING
From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.
Basic Info
Time: Tues/Thurs 10:30-11:50am
Location: NVIDIA Auditorium
Instructors: Kayvon Fatahalian and Kunle Olukotun
See the course info page for more info on policies and logistics.
Fall 2024 Schedule
Sep 24 |
|
Challenges of parallelizing code, motivations for parallel chips, processor basics
|
Sep 26 |
|
Forms of parallelism: multi-core, SIMD, and multi-threading
|
Oct 01 |
|
Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation
|
Oct 03 |
|
Structuring parallel programs. Process of parallelizing a program in data parallel and shared address space models
|
Oct 08 |
|
Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
|
Oct 10 |
|
Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
|
Oct 15 |
|
CUDA programming abstractions, and how they are implemented on modern GPUs
|
Oct 17 |
|
Data-parallel operations like map, reduce, scan, prefix sum, groupByKey
|
Oct 22 |
|
Producer-consumer locality, RDD abstraction, Spark implementation and scheduling
|
Oct 24 |
|
Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion
|
Oct 29 |
|
Energy-efficient computing, motivation for and design of hardware accelerators
|
Oct 31 |
|
Modern trends and programming systems for creating specialized hardware
|
Nov 05 |
|
Democracy Day (no class)
Go vote if you can!
|
Nov 07 |
|
Programming hardware, motivation for and definition of memory coherence
|
Nov 12 |
|
Cache Coherence + Memory Consistency
Invalidation-based coherence using MSI and MESI, false sharing,
relaxed consistency models, acquire/release semantics
|
Nov 14 |
|
Midterm Exam
The midterm is an evening exam
|
Nov 19 |
|
Fine-Grained Synchronization and Lock-Free Programming
Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem
|
Nov 21 |
|
Domain-Specific Programming Systems
Performance/productivity motivations for DSLs, case studies on several DSLs
|
Dec 03 |
|
Transactional Memory
Motivation for transactions, design space of transactional memory implementations, STM and HTM basics
|
Dec 05 |
|
Topic TBD + Course Wrap Up
Suggestions for post-cs149 topics
|
Dec 12 |
|
Final Exam
Held at 3:30pm. Location TBD
|
Programming Assignments
Oct 7 | Assignment 1: Analyzing Parallel Program Performance on a Quad-Core CPU |
Oct 24 | Assignment 2: Scheduling Task Graphs on a Multi-Core CPU |
Nov 8 | Assignment 3: A Simple Renderer in CUDA |
Dec 3 | Assignment 4: Optimizing DNN Performance on DNN Accelerator Hardware |
Written Assignments
Oct 8 | Written Assignment 1 |
Oct 15 | Written Assignment 2 |
Oct 29 | Written Assignment 3 |
Nov 13 | Written Assignment 4 |
Nov 21 | Written Assignment 5 |
Dec 5 | Written Assignment 6 |