Modified:
gradient descent
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Links to this note
Lyapunov function
Used in analyzing the stability of an equilibrium of a dynamical system. A Lyapunov function is a scalar-valued function of the state space…
gradient clipping
Why do we clip gradients in deep learning? When is it important and what is the right way to do it? It seems like the standard recipe used…
diffusion model
Diffusion models for image generation were independently invented at least twice: in a discrete-time variational inference framework…
fixed point
We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…
deep deterministic policy gradient
Deep deterministic policy gradient (DDPG) is an interesting RL algorithm with a somewhat misleading name. Although its name indicates that…
generative flow network
Many objects can be generated by a sequence of actions. For example: Generating language by adding one word at a time Generating a molecule…
fast weights
On an evolutionary timescale, it's useful to evolve structures that can learn quickly. The nervous system is an evolved organ system for…
ill-conditioned
Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…
natural gradient
We don't typically think of it this way, but you can derive a [ gradient descent ] step as finding the point that minimizes a linearized…