GPU might be suboptimal for DNN evaluation because DNN is sequentially layered. Therefore there is dependency between one layer to the next layer. For instance, if a DNN has 30 layers, then GPU massive parallelism could only utilize the parallelism within each layer.
I think GPU is friendly for graphics workload because graphics pipelines generally have a few stages where each stage has massive amount of parallelism to explore. On the other hand, the Pascal, Turing and Ampere series of NVIDIA GPU also have Tensor core built into their GPU hardware for specialized DNN matrix multiplication acceleration. But that is probably because NVIDIA's consumer grade GPU products are targeting the intersection of graphics, compute, and DNN for games, visual design, and scientific visualization.
tonycai
I think another reason GPUs may be suboptimal for many DNN algorithms since DNN can mostly be reduced to matrix multiplication but GPU supports much more generic computations. So GPUs have a lot of overhead for simple DNN computations.
GPU might be suboptimal for DNN evaluation because DNN is sequentially layered. Therefore there is dependency between one layer to the next layer. For instance, if a DNN has 30 layers, then GPU massive parallelism could only utilize the parallelism within each layer.
I think GPU is friendly for graphics workload because graphics pipelines generally have a few stages where each stage has massive amount of parallelism to explore. On the other hand, the Pascal, Turing and Ampere series of NVIDIA GPU also have Tensor core built into their GPU hardware for specialized DNN matrix multiplication acceleration. But that is probably because NVIDIA's consumer grade GPU products are targeting the intersection of graphics, compute, and DNN for games, visual design, and scientific visualization.