Algorithms, parameters, and turtles
With a few weeks of progress into the program its high time to explain the project more in depth. The project title has given away a few key details – AI, linear algebra, and advanced algorithms – but how does it all fit together in the end, and how does it connect to the turtles from the title? Read on to find out.
Assume that you are working on an advanced problem, perhaps the strength properties of a mechanical component which is in it self a part of an even larger problem, and you have through some amount of simplification and parameterization turned this into the problem of solving a large system of linear equations. Now, if this system is sufficiently large (our starting toy example is a sparse matrix in the range of 500k by 500k entries or 2.5e11 entries in total), even super computers have difficulties solving such systems in reasonable time. For this reason a number of algorithms have been developed to first approximate the solution as something known as a pseudo inverse or preconditioner, all in the name of just speeding up the solving step: We are specifically going to cover a Monte Carlo and stochastic gradient descent algorithm, but as the algorithms are not the focus of the project the specifics are left as an exercise.
The important detail is that the performance of these algorithms is entirely dependent on the choice of a couple of parameters, creating yet another layer of complexity. Our job is now to approximately find the best parameter choice for the approximate solution to an approximate system. This is where we start to approach the famous mythological concept of the world as resting on the back of a turtle, a turtle which as to not collapse must rest on another turtle, which of course must rest on another turtle and so on… It’s turtles (algorithms) all the way down, as commonly referred to.
Bayesian sampling based parameter search
We must stop this infinite chain of parameter search algorithms, and we do this with what is essentially a final through parameter search for all possible matrix configurations, finding the function of optimal parameters given a set of matrix features. This is something close to simply running a bunch of experiments to generate a lookup table, but as each evaluation takes so much time it is not possible to do this in an exhaustive way, leading to the need of a more in depth approach.
The solution is to build the model of the aforementioned function at the same time as you are doing the sampling. This allows you in each iterative step to find the preliminary optimal parameters, try sampling this configuration, and then refining the model specifically in that area updating it with the new data point, a technique called Bayesian sampling. The algorithm can also be seen as a form of reinforcement learning as it learns how to respond with an action (set of parameters) to an environment (matrix features), although I would argue this is more typically used in a more iterative process than the one studied here.
The solution however also introduces a set of sub-problems in what is hopefully the last complexity layer in the problem: We need to choose suitable matrix features which really are really closely correlated to the parameters, a suitable statistical model for regression, and a sampling strategy based on the statistic model for deciding on the next points to study. Finally we also need to do all of this in an efficient way to train the model with enough data points. These questions will all be revealed in the third blog post, so stay tuned!