Stanford CS348K, Spring 2020

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info
Tues/Thurs 3:00-4:30pm (virtual class only)
Instructor: Kayvon Fatahalian
See the course info page for more info on course policies and logistics.
Spring 2020 Schedule
Apr 7
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs/GPUs, understanding latency and bandwidth constraints
Apr 9
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, image filtering
Apr 14
Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping, portrait mode in the Pixel 2 camera
Apr 16
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Apr 21
popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation, where the compute lies in modern networks, DNN pruning, neural architecture search
Apr 23
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration, flexibility vs efficiency trade-offs
Apr 28
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key systems optimizations for parallel training
Apr 30
May 5
If the most important step of ML is acquiring training data, why don't we have systems for it?
May 7
Exploiting temporal coherence in video, specialization to scene and camera viewpoint
May 12
H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, emerging opportunities for compression when machines, not humans, will observe most images
May 14
Parallel video encoding, video ingest at Facebook, discussion of ethics of continuous capture
May 19
3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
May 21
3D graphics pipeline as a machine architecture (abstraction), Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
May 26
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
May 28
DXR ray tracing APIs, hardware acceleration of raytracing
Jun 2
Class Discussion
Jun 4
How might we architects renderers differently to support the needs of training, rather than game engines?
Jun 9
Project Presentations
Apr 23Burst Mode HDR Camera RAW Processing
May 5Optimizing a Conv Layer in Halide (Making Students Appreciate cuBLAS)
Jun 9Self-Selected Term Project