Doing [ math ] seems like a really promising area for AI. And by 'math' I mean math research (not arithmetic, which computers are already…
Modified: April 26, 2022.
See https://emtiyaz.github.io/papers/learning_from_bayes.pdf Suppose we have a learning problem For some choice of exponential-family…
Modified: July 18, 2021.
The Bayesian approach to statistics is to 'just use probability theory'. You write down a joint probability distribution over observed and…
Modified: April 08, 2023.
A model of [ option ] prices that assumes: The existence of a risk-free asset paying some interest rate, for example, US Treasury bonds…
Modified: October 26, 2021.
Related to [ natural gradient ] and the [ Fisher information ] matrix. Let's say we have a parametric model of some data. The Cramer-Rao…
Modified: July 05, 2022.
Aka computational trinitarianism . Churchill, My Early Life : I have noticed in my life deep resemblances between many different kinds of…
Modified: April 26, 2022.
A pointwise maximum of [ convex ] functions Specifically, we require that is convex in for every . is itself convex in , and when…
Modified: August 13, 2022.
Any reasonable 'adapted' and 'integrable' [ stochastic process ] can be written as the sum of a [ martingale ] and a [ predictable process…
Modified: August 27, 2022.
One-particle system Let be the [ Lagrangian ] for a system with time-varying position and velocity , with forces defined by a potential…
Modified: July 14, 2022.
Given a [ diffusion process ] specified by the [ stochastic differential equation ] the [ Fokker-Planck ] equation aka Kolmogorov forward…
Modified: August 28, 2022.
Weight-Space View Recall standard linear regression. We suppose and where , where can be augmented with an implicit 1 term to allow a…
Modified: March 16, 2022.
This is the technical formulation that makes it meaningful to write [ stochastic differential equation ]s 'driven by' a Weiner process…
Modified: August 30, 2022.
A Itô process is a [ stochastic process ] satisfying a [ stochastic differential equation ] of the form where is Brownian motion. This…
Modified: August 27, 2022.
The partial derivatives of a multivariate function form its Jacobian matrix The convention here (matching Wikipedia, and I believe also…
Modified: August 13, 2022.
For any [ convex ] function and probability distribution , Jensen's inequality states that The special case of a distribution over two…
Modified: June 25, 2022.
Given a [ constrained optimization ] problem over a [ convex ] function , we consider the [ Lagrangian ] function introducing variables…
Modified: July 07, 2022.
The Kraft inequality in information theory states (roughly?) that, for any probability distribution , there is a prefix code C under which…
Modified: April 12, 2022.
We're given a [ constrained optimization ] problem Note that the standard formulation of Lagrange multipliers handles only equality…
Modified: April 29, 2023.
The Laplacian or Laplace operator computes the [ vector divergence ] of the gradient of a function , given by the sum of 'unmixed…
Modified: June 08, 2024.
References: Jess Riedel on the Legendre transform in physics Stack Overflow discussion Prof. V. Balakrishnan on Hamiltonian dynamics…
Modified: July 15, 2022.
A [ stochastic process ] in which the past is independent of the future, conditioned on the current value. Striking point made by https…
Modified: August 29, 2022.
References: https://photonlines.substack.com/p/an-intuitive-guide-to-maxwells-equations Electric and magnetic fields : the electromagnetic…
Modified: June 08, 2024.
Used in analyzing the stability of an equilibrium of a dynamical system. A Lyapunov function is a scalar-valued function of the state space…
Modified: January 04, 2024.
Notes from Charles Margossian's talk on pharmacometrics models. Types of ODEs: Linear: can be solved by [ matrix exponential ]ials nonlinear…
Modified: August 19, 2021.
See [ generative vs discriminative modeling ], [ actor-critic ]
Modified: May 23, 2021.
I'm trying to build my understanding. These are fragments of intuitions. Bayesian inference starts with a prior P and a likelihood. Given…
Modified: January 15, 2021.
Notes from working through Kevin Buzzard's Natural number game (imperial.ac.uk) using the Lean theorem prover. We know from the [ Curry…
Modified: June 09, 2021.
The -Wasserstein distance between probability distributions is defined as where the infimum is over all joint distributions having…
Modified: August 04, 2022.
Fundamentally an algorithm is any computational procedure: something that takes in data and spits out some function of that data. Computer…
Modified: November 29, 2022.
Modified: December 01, 2023.
This is my stab at explaining automatic differentiation, specifically backprop and applications to neural nets. A few dimensions to think…
Modified: August 23, 2022.
There are two major 'chain rules' relevant to machine learning: the chain rule of probability theory and the chain rule from calculus…
Modified: August 13, 2022.
Multivariate Completion of Squares A useful trick: if is a symmetric, nonsingular matrix, then This is easy to see just by expanding out…
Modified: March 16, 2022.
Suppose we want to optimize an objective under some equality and/or inequality constraints, Some general classes of approach we can use are…
Modified: July 07, 2022.
A contraction mapping on a metric space is a function such that for all and for some , called the [ Lipschitz ] constant of the map…
Modified: August 13, 2022.
See also: https://www2.sonycsl.co.jp/person/nielsen/Note-LegendreTransformation.pdf Jess Riedel on the Legendre transform in physics looks…
Modified: June 25, 2022.
A convex function satisfies the property that a line between any two points on its graph is on or above the graph: for any . It is…
Modified: July 09, 2022.
References: http://www0.cs.ucl.ac.uk/staff/C.Archambeau/SDE_web/figs_files/ca07_RgIto_text.pdf https://www.ma.imperial.ac.uk/~pavl/lec_diff…
Modified: August 29, 2022.
TODO: flesh out theory, understand ADMM (e.g., https://www.cis.upenn.edu/~cis515/ws-book-IIb.pdf )
Modified: July 06, 2022.
https://twitter.com/getjonwithit/status/1832842063768867176 Duality" is a deep concept in mathematics, but an intuitive way to think about…
Modified: September 09, 2024.
Measures uncertainty, disorder, or randomness. The (Shannon) entropy of a probability distribution is: The quantity inside the…
Modified: April 15, 2022.
Exponential Families, Conjugacy, Convexity, and Variational Inference Any parameterized family of probability densities that can be written…
Modified: May 21, 2022.
A filtration is defined by monotonically increasing subsets of a [ probability space ]; that is, subsets such that we have for all…
Modified: August 27, 2022.
We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…
Modified: August 13, 2022.
"What I cannot create, I do not understand". Related to: [ computational complexity ]: provers vs verifiers. [ P != NP ] [ production vs…
Modified: April 27, 2020.
Modified: March 16, 2022.
Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…
Modified: December 28, 2022.
Importance sampling allows us to compute expectations under a distribution using samples from a different distribution , by weighting the…
Modified: July 05, 2022.
The Leibniz calculus notation using infinitestimal quantities like or is simultaneously Very sensible and intuitive, but also Constantly…
Modified: September 06, 2022.
multiple senses: in machine learning: positive definite (Mercer) kernels in linear algebra: kernel (nullspace) of a linear map in CS systems…
Modified: February 25, 2022.
In principle we can apply [ automatic differentiation ] through any composition of differentiable operations. This lets us get gradients of…
Modified: July 21, 2022.
A martingale is any [ stochastic process ] that stays the same in expectation. Formally, is a martingale if This condition is related to…
Modified: August 27, 2022.
Reviewing this 3blue1brown video: https://www.youtube.com/watch?v=O85OWBJ2ayo The matrix exponential is written as E to the power of a…
Modified: November 20, 2023.
The Woodbury-Morrison-Sherman matrix inversion lemma, is sometimes useful just for algebraic simplifications. In cases where and are…
Modified: March 16, 2022.
A function is measurable with respect to [ sigma-algebra ]s on its domain and on its range if the pre-image of any event is…
Modified: August 27, 2022.
Considering a bilevel optimization problem (or saddle point problem) on the two-argument function , in general it holds that That is, the…
Modified: July 07, 2022.
We say that a random vector is multivariate Gaussian with mean and covariance matrix if it can be written where is a vector if i.i.d…
Modified: March 16, 2022.
A negligible function is a function such that, for any positive integer there exists an integer such that for all , i.e., that…
Modified: October 23, 2022.
If is a [ martingale ] and is a [ stopping time ], then any of the following conditions implies that : The stopping time is bounded…
Modified: August 29, 2022.
A [ stochastic process ] is predictable if its value at time is fully determined by information available at time . Any fully…
Modified: August 27, 2022.
A probability space consists of: A set of outcomes aka possible worlds; these represent all the ways the world might be. This is the…
Modified: August 27, 2022.
Introduced by Geoff Hinton (1999): Products of Experts . Each expert produces a probability distribution. These are combined by…
Modified: May 15, 2021.
Proximal methods in optimization The proximal operator of a [ convex ] function is defined as the minimizer of plus a distance penalty…
Modified: July 07, 2022.
The rate equation or master equation for a continuous-time Markov [ stochastic process ] describes how the probability density of the…
Modified: August 28, 2022.
Formally, a random variable is a (measurable) function defined on outcomes from a [ probability space ] . That is, in any possible…
Modified: August 27, 2022.
References: Ludwig Winkler's post on Reverse time stochastic differential equations . Suppose we have a [ stochastic differential equation…
Modified: August 27, 2022.
The score function is the gradient of a log-density with respect to its parameters: It is the direction that we would move the parameters…
Modified: July 21, 2022.
A [ stochastic process ] is (strictly) stationary if all of its joint distributions are invariant under time displacement. It is wide…
Modified: August 28, 2022.
A stochastic process is a collection of [ random variable ]s defined on a common [ probability space ] . Equivalently, it is a joint…
Modified: August 27, 2022.
SDEs are typically written in terms of the differential of a Weiner process (Brownian motion), e.g., Although Weiner processes are nowhere…
Modified: August 29, 2022.
A stopping time for a stochastic process is a time-valued That is, integer-valued for discrete-time processes and real-valued for…
Modified: August 27, 2022.
Every in machine learning talks about tensors, but no one really understands what they are. This page collects several definitions and…
Modified: July 18, 2022.
The tensor product of two vector spaces (defined on the same scalar field, we'll assume ) is the vector space of formal sums of…
Modified: July 18, 2022.
Trace of a Linear Operator We define the trace as the sum of diagonal elements of a matrix: Lemma : If and are square, then . Proof…
Modified: March 16, 2022.
According to this reddit post , one of the main takeaways of functional analysis is that the right way to interpret the 'transpose' of a…
Modified: November 06, 2020.
Inspired by Kevin Buzzard's overview of the state of automatic theorem provers. Type theory is like set theory in that sets and types are…
Modified: December 23, 2021.
The divergence of a vector-valued function on a vector field measures the extent to which a given point is a source of the field. It…
Modified: June 08, 2024.
Note: these are personal notes, taken as I was refreshing myself on this material. They're mostly stream of consciousness and probably not…
Modified: March 16, 2022.