In lecture, Professor Kayvon said there were methods that can speed up convergence for gradient descent. Which ones are generally considered the best, and which ones are used in practice the most?


There's also conjugate gradient descent and various momentum-based methods that can perform better depending on the smoothness of the objective.