Stanford CS149, Fall 2022
PARALLEL COMPUTING
From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.
Basic Info
Time: Tues/Thurs 10:30-11:50am
Location: NVIDIA Auditorium
Instructors: Kayvon Fatahalian and Kunle Olukotun
See the course info page for more info on policies and logistics.
Fall 2022 Schedule
Sep 27 |
|
Challenges of parallelizing code, motivations for parallel chips, processor basics
|
Sep 29 |
|
Forms of parallelism: multicore, SIMD, threading + understanding latency and bandwidth
|
Oct 04 |
|
Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation
|
Oct 06 |
|
Ways of thinking about parallel programs, and their corresponding hardware implementations, thought process of parallelizing a program in data parallel and shared address space models
|
Oct 11 |
|
Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
|
Oct 13 |
|
Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
|
Oct 18 |
|
CUDA programming abstractions, and how they are implemented on modern GPUs
|
Oct 20 |
|
Data-parallel operations like map, reduce, scan, prefix sum, groupByKey
|
Oct 25 |
|
Producer-consumer locality, RDD abstraction, Spark implementation and scheduling
|
Oct 27 |
|
Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing
|
Nov 01 |
|
Consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics
|
Nov 03 |
|
Implementation of locks, fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers
|
Nov 08 |
|
Democracy Day (no class)
Take time to volunteer/educate yourself/take action!
|
Nov 10 |
|
Motivation for transactions, design space of transactional memory implementations.
|
Nov 15 |
|
Midterm (no class)
The midterm will be an evening midterm. We may use the class period as a review period.
|
Nov 17 |
|
Finishing up transactional memory focusing on implementations of STM and HTM.
|
Nov 29 |
|
Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs, Performance/productivity motivations for DSLs, case study on Halide image processing DSL
|
Dec 01 |
|
domain-specific frameworks for graph processing, streaming graph processing, graph compression, DRAM basics
|
Dec 06 |
|
Programming reconfigurable hardware like FPGAs and CGRAs
|
Dec 08 |
|
Efficiently scheduling DNN layers, mapping to matrix-multiplication, layer fusion, DNN accelerators (e.g., GPU TensorCores, TPU)
|
Programming Assignments
Written Assignments
Oct 14 | Written Assignment 1 |
Oct 28 | Written Assignment 2 |
Nov 4 | Written Assignment 3 |
Nov 11 | Written Assignment 4 |
Dec 5 | Written Assignment 5 |