Since we are just setting step size to be small without step control, does this mean we may need a lot of iterations? Otherwise we may not reach the minimum.
christina
There are many second-order optimizers that can speed up the optimization, comparing to the first-order methods like gradient decent, one is ADAM which is often used in DNN.
Since we are just setting step size to be small without step control, does this mean we may need a lot of iterations? Otherwise we may not reach the minimum.