Stanford CS149, Fall 2024

PARALLEL COMPUTING

From smart phones, to multi-core CPUs and GPUs, to the world's largest supercomputers and web sites, parallel processing is ubiquitous in modern computing. The goal of this course is to provide a deep understanding of the fundamental principles and engineering trade-offs involved in designing modern parallel computing systems as well as to teach parallel programming techniques necessary to effectively utilize these machines. Because writing good parallel programs requires an understanding of key machine performance characteristics, this course will cover both parallel hardware and software design.

Basic Info

Time: Tues/Thurs 10:30-11:50am

Location: NVIDIA Auditorium

Instructors: Kayvon Fatahalian and Kunle Olukotun

See the course info page for more info on policies and logistics.

Fall 2024 Schedule

Sep 24		Why Parallelism? Why Efficiency? Challenges of parallelizing code, motivations for parallel chips, processor basics
Sep 26		A Modern Multi-Core Processor Forms of parallelism: multi-core, SIMD, and multi-threading
Oct 01		Multi-Core Architecture Part II + ISPC Programming Abstractions Finish up multi-threaded and latency vs. bandwidth. ISPC programming, abstraction vs. implementation
Oct 03		Parallel Programming Basics Structuring parallel programs. Process of parallelizing a program in data parallel and shared address space models
Oct 08		Performance Optimization I: Work Distribution and Scheduling Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing
Oct 10		Performance Optimization II: Locality, Communication, and Contention Message passing, async vs. blocking sends/receives, pipelining, increasing arithmetic intensity, avoiding contention
Oct 15		GPU Architecture and CUDA Programming CUDA programming abstractions, and how they are implemented on modern GPUs
Oct 17		Data-Parallel Thinking Data-parallel operations like map, reduce, scan, prefix sum, groupByKey
Oct 22		Distributed Data-Parallel Computing Using Spark Producer-consumer locality, RDD abstraction, Spark implementation and scheduling
Oct 24		Efficiently Evaluating DNNs (Software Solutions) Efficiently scheduling DNN layers, mapping convs to matrix-multiplication, transformers, layer fusion
Oct 29		Hardware Specialization Energy-efficient computing, motivation for and design of hardware accelerators
Oct 31		Programming Specialized Hardware Modern trends and programming systems for creating specialized hardware
Nov 05		Democracy Day (no class) Go vote if you can!
Nov 07		Programming Specialized Hardware II + Cache Coherence Programming hardware, motivation for and definition of memory coherence
Nov 12		Cache Coherence Invalidation-based coherence using MSI and MESI, false sharing
Nov 14		Midterm Exam The midterm is an evening exam
Nov 19		Lock Implementations, Fine-Grained Synchronization and Lock-Free Programming Fine-grained synchronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem
Nov 21		Relaxed Consistency + Domain-Specific Programming Systems Motivation for relaxed consistency, implications to programmers. Performance/productivity motivations for DSLs, case studies on several DSLs
Dec 03		Transactional Memory Motivation for transactions, design space of transactional memory implementations, STM and HTM basics
Dec 05		Transactions II + Ask Me Anything with Kayvon and Kunle Suggestions for post cs149 topics. AMA with the course staff.
Dec 12		Final Exam Held at 3:30pm. Location TBD

Programming Assignments

Oct 7	Assignment 1: Analyzing Parallel Program Performance on a Quad-Core CPU
Oct 24	Assignment 2: Scheduling Task Graphs on a Multi-Core CPU
Nov 8	Assignment 3: A Simple Renderer in CUDA
Dec 5	Assignment 4: Optimizing DNN Performance on DNN Accelerator Hardware
Dec 6	Assignment 5: (EC) Big Graph Processing in OpenMP

Written Assignments

Oct 8	Written Assignment 1
Oct 15	Written Assignment 2
Oct 29	Written Assignment 3
Nov 13	Written Assignment 4
Dec 5	Written Assignment 5
Dec 6	Written Assignment 6 (EC)