Tensor cores implement 4x4 matrix multiplication through specialized hardware. This is done through a circuit that is designed to do 4x4 or these days 8x4 matrix multiplication on NVIDIA chips. It can be thought of as a small spatial architecture.
amohamdy
How common are these complex instructions in specialized hardware?
Tensor cores implement 4x4 matrix multiplication through specialized hardware. This is done through a circuit that is designed to do 4x4 or these days 8x4 matrix multiplication on NVIDIA chips. It can be thought of as a small spatial architecture.