Notes tagged with "math": Nonlinear Function

73 notes tagged with "math"

AI for math

Doing [ math ] seems like a really promising area for AI. And by 'math' I mean math research (not arithmetic, which computers are already…

Tagged with: #ai#math#ideas

Bayesian learning rule

See https://emtiyaz.github.io/papers/learning_from_bayes.pdf Suppose we have a learning problem For some choice of exponential-family…

Tagged with: #math#machine-learning#bayes

Bayesian

The Bayesian approach to statistics is to 'just use probability theory'. You write down a joint probability distribution over observed and…

Tagged with: #math#bayes

Black-Scholes

A model of [ option ] prices that assumes: The existence of a risk-free asset paying some interest rate, for example, US Treasury bonds…

Tagged with: #finance#math#modeling

Curry-Howard correspondence

Aka computational trinitarianism . Churchill, My Early Life : I have noticed in my life deep resemblances between many different kinds of…

Tagged with: #math

Cramer-Rao bound

Related to [ natural gradient ] and the [ Fisher information ] matrix. Let's say we have a parametric model of some data. The Cramer-Rao…

Tagged with: #math#machine-learning

Danskin's theorem

A pointwise maximum of [ convex ] functions Specifically, we require that is convex in for every . is itself convex in , and when…

Tagged with: #math

Doob decomposition theorem

Any reasonable 'adapted' and 'integrable' [ stochastic process ] can be written as the sum of a [ martingale ] and a [ predictable process…

Tagged with: #math

Euler-Lagrange

One-particle system Let be the [ Lagrangian ] for a system with time-varying position and velocity , with forces defined by a potential…

Tagged with: #math#physics

Fokker-Planck

Given a [ diffusion process ] specified by the [ stochastic differential equation ] the [ Fokker-Planck ] equation aka Kolmogorov forward…

Tagged with: #math

Gaussian process

Weight-Space View Recall standard linear regression. We suppose and where , where can be augmented with an implicit 1 term to allow a…

Tagged with: #math#machine-learning

Itô process

A Itô process is a [ stochastic process ] satisfying a [ stochastic differential equation ] of the form where is Brownian motion. This…

Tagged with: #math

Itô integral

This is the technical formulation that makes it meaningful to write [ stochastic differential equation ]s 'driven by' a Weiner process…

Tagged with: #math

Jacobian

The partial derivatives of a multivariate function form its Jacobian matrix The convention here (matching Wikipedia, and I believe also…

Tagged with: #math

Jensen's inequality

For any [ convex ] function and probability distribution , Jensen's inequality states that The special case of a distribution over two…

Tagged with: #math

Karush-Kuhn-Tucker conditions

Given a [ constrained optimization ] problem over a [ convex ] function , we consider the [ Lagrangian ] function introducing variables…

Tagged with: #math

Kraft inequality

The Kraft inequality in information theory states (roughly?) that, for any probability distribution , there is a prefix code C under which…

Tagged with: #math

Lagrange multiplier

We're given a [ constrained optimization ] problem Note that the standard formulation of Lagrange multipliers handles only equality…

Tagged with: #math#machine-learning

Legendre transform

References: Jess Riedel on the Legendre transform in physics Stack Overflow discussion Prof. V. Balakrishnan on Hamiltonian dynamics…

Tagged with: #math

Lyapunov function

Used in analyzing the stability of an equilibrium of a dynamical system. A Lyapunov function is a scalar-valued function of the state space…

Tagged with: #math

Markov process

A [ stochastic process ] in which the past is independent of the future, conditioned on the current value. Striking point made by https…

Tagged with: #math

ODE

Notes from Charles Margossian's talk on pharmacometrics models. Types of ODEs: Linear: can be solved by [ matrix exponential ]ials nonlinear…

Tagged with: #math

P != NP

See [ generative vs discriminative modeling ], [ actor-critic ]

Tagged with: #math

Pac-Bayes

I'm trying to build my understanding. These are fragments of intuitions. Bayesian inference starts with a prior P and a likelihood. Given…

Tagged with: #math#machine-learning#bayes

Theorem Proving in Lean

Notes from working through Kevin Buzzard's Natural number game (imperial.ac.uk) using the Lean theorem prover. We know from the [ Curry…

Tagged with: #math#papers

Wasserstein

The -Wasserstein distance between probability distributions is defined as where the infimum is over all joint distributions having…

Tagged with: #math

algorithm

Fundamentally an algorithm is any computational procedure: something that takes in data and spits out some function of that data. Computer…

Tagged with: #math#machine-learning

approximate Bayesian inference

Tagged with: #machine-learning#math

automatic differentiation

This is my stab at explaining automatic differentiation, specifically backprop and applications to neural nets. A few dimensions to think…

Tagged with: #math#machine-learning

chain rule

There are two major 'chain rules' relevant to machine learning: the chain rule of probability theory and the chain rule from calculus…

Tagged with: #math

complete the square

Multivariate Completion of Squares A useful trick: if is a symmetric, nonsingular matrix, then This is easy to see just by expanding out…

Tagged with: #math

constrained optimization

Suppose we want to optimize an objective under some equality and/or inequality constraints, Some general classes of approach we can use are…

Tagged with: #math#machine-learning

contraction

A contraction mapping on a metric space is a function such that for all and for some , called the [ Lipschitz ] constant of the map…

Tagged with: #math

convex dual

See also: https://www2.sonycsl.co.jp/person/nielsen/Note-LegendreTransformation.pdf Jess Riedel on the Legendre transform in physics looks…

Tagged with: #math

convex

A convex function satisfies the property that a line between any two points on its graph is on or above the graph: for any . It is…

Tagged with: #math

diffusion process

References: http://www0.cs.ucl.ac.uk/staff/C.Archambeau/SDE_web/figs_files/ca07_RgIto_text.pdf https://www.ma.imperial.ac.uk/~pavl/lec_diff…

Tagged with: #math

dual gradient ascent

TODO: flesh out theory, understand ADMM (e.g., https://www.cis.upenn.edu/~cis515/ws-book-IIb.pdf )

Tagged with: #math#machine-learning

entropy

Measures uncertainty, disorder, or randomness. The (Shannon) entropy of a probability distribution is: The quantity inside the…

Tagged with: #math#bayes#physics

exponential family notes

Exponential Families, Conjugacy, Convexity, and Variational Inference Any parameterized family of probability densities that can be written…

Tagged with: #math#bayes#machine-learning

filtration

A filtration is defined by monotonically increasing subsets of a [ probability space ]; that is, subsets such that we have for all…

Tagged with: #math

fixed point

We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…

Tagged with: #math

generative vs discriminative modeling

"What I cannot create, I do not understand". Related to: [ computational complexity ]: provers vs verifiers. [ P != NP ] [ production vs…

Tagged with: #math#fundamental#machine-learning

high-dimensional

Tagged with: #math

ill-conditioned

Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…

Tagged with: #math#machine-learning

importance sampling

Importance sampling allows us to compute expectations under a distribution using samples from a different distribution , by weighting the…

Tagged with: #math#bayes

infinitesimal

The Leibniz calculus notation using infinitestimal quantities like or is simultaneously Very sensible and intuitive, but also Constantly…

Tagged with: #math

kernel

multiple senses: in machine learning: positive definite (Mercer) kernels in linear algebra: kernel (nullspace) of a linear map in CS systems…

Tagged with: #machine-learning#math

martingale

A martingale is any [ stochastic process ] that stays the same in expectation. Formally, is a martingale if This condition is related to…

Tagged with: #math

matrix inversion lemma

The Woodbury-Morrison-Sherman matrix inversion lemma, is sometimes useful just for algebraic simplifications. In cases where and are…

Tagged with: #math

matrix exponential

Reviewing this 3blue1brown video: https://www.youtube.com/watch?v=O85OWBJ2ayo The matrix exponential is written as E to the power of a…

Tagged with: #math

measurable function

A function is measurable with respect to [ sigma-algebra ]s on its domain and on its range if the pre-image of any event is…

Tagged with: #math

minimax duality

Considering a bilevel optimization problem (or saddle point problem) on the two-argument function , in general it holds that That is, the…

Tagged with: #math

multivariate gaussian

We say that a random vector is multivariate Gaussian with mean and covariance matrix if it can be written where is a vector if i.i.d…

Tagged with: #math

negligible

A negligible function is a function such that, for any positive integer there exists an integer such that for all , i.e., that…

Tagged with: #math#crypto

optional stopping

If is a [ martingale ] and is a [ stopping time ], then any of the following conditions implies that : The stopping time is bounded…

Tagged with: #math

predictable process

A [ stochastic process ] is predictable if its value at time is fully determined by information available at time . Any fully…

Tagged with: #math

probability space

A probability space consists of: A set of outcomes aka possible worlds; these represent all the ways the world might be. This is the…

Tagged with: #math

product of experts

Introduced by Geoff Hinton (1999): Products of Experts . Each expert produces a probability distribution. These are combined by…

Tagged with: #math#machine-learning#modeling

proximal

Proximal methods in optimization The proximal operator of a [ convex ] function is defined as the minimizer of plus a distance penalty…

Tagged with: #math

random variable

Formally, a random variable is a (measurable) function defined on outcomes from a [ probability space ] . That is, in any possible…

Tagged with: #math

rate equation

The rate equation or master equation for a continuous-time Markov [ stochastic process ] describes how the probability density of the…

Tagged with: #math

reverse diffusion

References: Ludwig Winkler's post on Reverse time stochastic differential equations . Suppose we have a [ stochastic differential equation…

Tagged with: #math

score function

The score function is the gradient of a log-density with respect to its parameters: It is the direction that we would move the parameters…

Tagged with: #math

stationary

A [ stochastic process ] is (strictly) stationary if all of its joint distributions are invariant under time displacement. It is wide…

Tagged with: #math

stochastic differential equation

SDEs are typically written in terms of the differential of a Weiner process (Brownian motion), e.g., Although Weiner processes are nowhere…

Tagged with: #math

stochastic process

A stochastic process is a collection of [ random variable ]s defined on a common [ probability space ] . Equivalently, it is a joint…

Tagged with: #math

stopping time

A stopping time for a stochastic process is a time-valued That is, integer-valued for discrete-time processes and real-valued for…

Tagged with: #math

tensor product

The tensor product of two vector spaces (defined on the same scalar field, we'll assume ) is the vector space of formal sums of…

Tagged with: #math

tensor

Every in machine learning talks about tensors, but no one really understands what they are. This page collects several definitions and…

Tagged with: #math#machine-learning

trace

Trace of a Linear Operator We define the trace as the sum of diagonal elements of a matrix: Lemma : If and are square, then . Proof…

Tagged with: #math

transposes are measures

According to this reddit post , one of the main takeaways of functional analysis is that the right way to interpret the 'transpose' of a…

Tagged with: #math

type theory

Inspired by Kevin Buzzard's overview of the state of automatic theorem provers. Type theory is like set theory in that sets and types are…

Tagged with: #math

mcmc notes

Note: these are personal notes, taken as I was refreshing myself on this material. They're mostly stream of consciousness and probably not…

Tagged with: #math#machine-learning#bayes

See All tags