Stanford CS348K, Fall 2018

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info
Tues/Thurs 1:30-2:50pm
Room 60-109
Instructor: Kayvon Fatahalian
See the course info page for more info on course policies, logistics, and how to prepare for the course.
Fall 2018 Schedule
Sep 25
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs, GPUs; understanding latency and bandwidth constraints
Sep 27
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, image filtering
Oct 2
Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping, portrait mode in the Pixel 2 camera
Oct 4
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Oct 9
Benefits of fixed-function processing, comparing GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Oct 11
JPG compression. H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, emerging opportunities for compression when machines, not humans, will observe most images
Oct 16
Light field representation, light-field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
Oct 18
popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation, where the compute lies in modern networks, DNN pruning, neural architecture search
Oct 23
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Oct 25
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration
Oct 30
Neural module networks, discussion on value of modularity vs. end-to-end learning
Nov 1
Exploiting temporal coherence in video, pipelined networks, specialization to scene and camera viewpoint, sharing computations across applications and users
Nov 6
Facebook SVE/Lumos, Scanner, processing as a service
Nov 8
3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Nov 13
Texture sampling basics, hardware texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Nov 15
Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Nov 27
Guest Lecture: Bill Mark (Google)
Topic: specialized hardware for deep learning and computational photography at Google
Nov 29
Domain-Specific Languages for Shading (with Tim Foley, NVIDIA)
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
Dec 4
Mapping shaders to GPUs, Design of platform for ML computations, rendering concerns of VR
Dec 6
How deep learning and hardware specialization stand to make real-time raytracing feasible
Assignments and Projects
optionalOptional Assignment 0: Analyzing Parallel Program Performance on a Quad-Core CPU
All CVS348K students are encouraged to attempt this assignment during or before the first week of the course to check their background in parallel systems.
Oct 22Assignment 1: Burst Mode HDR Camera RAW Processing for the kPhone 348
optionalOptional Assignment 2: Implementing a Separable Conv Layer in Halide
Dec 11Final Project Guidelines: students will complete a substantial term project on a course-relevant topic of their choosing.