Created: October 25, 2021
Modified: October 25, 2021

kitchen sink deep learning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

I just coined the phrase 'kitchen sink' deep learning for a vague idea that comes to me occasionally. Roughly: rather than using a uniform architecture like transformers, we should throw all kinds of crazy components into our models. Models should be able to easily use trig functions, fourier transforms, LP solvers, convex optimizers, ODE solvers, and basically any clever algorithm or hard-won bit of scientific insight we have sitting around. We know that these are powerful, flexible computation shapes that would otherwise require a ton of parameters and data to learn in a general-purpose way.
Where would this be useful? Will these be helpful in language models? Certainly individual components, like Fourier transforms, are useful in speech recognition or vision systems.
It's silly to think about putting these things into an architecture too many times, because you have to compute all of them to get gradients during training.

kitchen sink deep learning

Links to this note

expressive transformer

Meta