In recent years, bi-level optimization has raised much interest in the machine learning community, in particular for hyper-parameters optimization and implicit deep learning. This type of problems is often tackled using first-order that requires the computation of the gradient whose expression can be obtained using the implicit function theorem. The computation of this gradient requires the computation of matrix-vector products involving the inverse of a large matrix which is computationally demanding.

In our work, we propose a novel strategy coined SHINE to tackle this computational bottleneck when the inner problem can be solved with a quasi-Newton algorithm. The main idea is to use the quasi-Newton matrices estimated from the resolution of the inner problem to efficiently approximate the inverse matrix in the direction needed for the gradient computation. We prove that under some restrictive conditions, this strategy gives a consistent estimate of the true gradient.

In addition, by modifying the quasi-Newton updates, we provide theoretical guarantees that our method asymptotically estimates the true implicit gradient under weaker hypothesis.

We empirically study this approach in many settings, ranging from hyperparameter optimization to large Multiscale Deep Equilibrium models applied to CIFAR and ImageNet. We show that it reduces the computational cost of the backward pass by up to two orders of magnitude. All this is achieved while retaining the excellent performance of the original models in hyperparameter optimization and on CIFAR, and giving encouraging and competitive results on ImageNet.