Are other optimization algorithms ever used instead? Like coordinate descent, Newton's method, or Newton-Raphson? Each has its own advantages/disadvantages, so I'm wondering if GD is generally best.
szzheng
For those interested, here's a paper that involves an adaptive step size technique: https://arxiv.org/pdf/1412.6980.pdf
An advantage of this is that by increasing the step size when the gradient is very small, gradient descent may converge more quickly.
Are other optimization algorithms ever used instead? Like coordinate descent, Newton's method, or Newton-Raphson? Each has its own advantages/disadvantages, so I'm wondering if GD is generally best.
For those interested, here's a paper that involves an adaptive step size technique: https://arxiv.org/pdf/1412.6980.pdf
An advantage of this is that by increasing the step size when the gradient is very small, gradient descent may converge more quickly.