Notes tagged with "math": Nonlinear Function

73 notes tagged with "math"

AI for math

Doing [ math ] seems like a really promising area for AI. And by 'math' I mean math research (not arithmetic, which computers are already…

Bayesian learning rule

See https://emtiyaz.github.io/papers/learning_from_bayes.pdf Suppose we have a learning problem For some choice of exponential-family…

Bayesian

The Bayesian approach to statistics is to 'just use probability theory'. You write down a joint probability distribution over observed and…

Black-Scholes

A model of [ option ] prices that assumes: The existence of a risk-free asset paying some interest rate, for example, US Treasury bonds…

Curry-Howard correspondence

Aka computational trinitarianism . Churchill, My Early Life : I have noticed in my life deep resemblances between many different kinds of…

Cramer-Rao bound

Related to [ natural gradient ] and the [ Fisher information ] matrix. Let's say we have a parametric model of some data. The Cramer-Rao…

Danskin's theorem

A pointwise maximum of [ convex ] functions Specifically, we require that is convex in for every . is itself convex in , and when…

Doob decomposition theorem

Any reasonable 'adapted' and 'integrable' [ stochastic process ] can be written as the sum of a [ martingale ] and a [ predictable process…

Euler-Lagrange

One-particle system Let be the [ Lagrangian ] for a system with time-varying position and velocity , with forces defined by a potential…

Fokker-Planck

Given a [ diffusion process ] specified by the [ stochastic differential equation ] the [ Fokker-Planck ] equation aka Kolmogorov forward…

Gaussian process

Weight-Space View Recall standard linear regression. We suppose and where , where can be augmented with an implicit 1 term to allow a…

Itô process

A Itô process is a [ stochastic process ] satisfying a [ stochastic differential equation ] of the form where is Brownian motion. This…

Itô integral

This is the technical formulation that makes it meaningful to write [ stochastic differential equation ]s 'driven by' a Weiner process…

Jacobian

The partial derivatives of a multivariate function form its Jacobian matrix The convention here (matching Wikipedia, and I believe also…

Jensen's inequality

For any [ convex ] function and probability distribution , Jensen's inequality states that The special case of a distribution over two…

Karush-Kuhn-Tucker conditions

Given a [ constrained optimization ] problem over a [ convex ] function , we consider the [ Lagrangian ] function introducing variables…

Kraft inequality

The Kraft inequality in information theory states (roughly?) that, for any probability distribution , there is a prefix code C under which…

Lagrange multiplier

We're given a [ constrained optimization ] problem Note that the standard formulation of Lagrange multipliers handles only equality…

Legendre transform

References: Jess Riedel on the Legendre transform in physics Stack Overflow discussion Prof. V. Balakrishnan on Hamiltonian dynamics…

Lyapunov function

Used in analyzing the stability of an equilibrium of a dynamical system. A Lyapunov function is a scalar-valued function of the state space…

Markov process

A [ stochastic process ] in which the past is independent of the future, conditioned on the current value. Striking point made by https…

ODE

Notes from Charles Margossian's talk on pharmacometrics models. Types of ODEs: Linear: can be solved by [ matrix exponential ]ials nonlinear…

P != NP

See [ generative vs discriminative modeling ], [ actor-critic ]

Pac-Bayes

I'm trying to build my understanding. These are fragments of intuitions. Bayesian inference starts with a prior P and a likelihood. Given…

Theorem Proving in Lean

Notes from working through Kevin Buzzard's Natural number game (imperial.ac.uk) using the Lean theorem prover. We know from the [ Curry…

Wasserstein

The -Wasserstein distance between probability distributions is defined as where the infimum is over all joint distributions having…

algorithm

Fundamentally an algorithm is any computational procedure: something that takes in data and spits out some function of that data. Computer…

approximate Bayesian inference

automatic differentiation

This is my stab at explaining automatic differentiation, specifically backprop and applications to neural nets. A few dimensions to think…

chain rule

There are two major 'chain rules' relevant to machine learning: the chain rule of probability theory and the chain rule from calculus…

complete the square

Multivariate Completion of Squares A useful trick: if is a symmetric, nonsingular matrix, then This is easy to see just by expanding out…

constrained optimization

Suppose we want to optimize an objective under some equality and/or inequality constraints, Some general classes of approach we can use are…

contraction

A contraction mapping on a metric space is a function such that for all and for some , called the [ Lipschitz ] constant of the map…

convex dual

See also: https://www2.sonycsl.co.jp/person/nielsen/Note-LegendreTransformation.pdf Jess Riedel on the Legendre transform in physics looks…

convex

A convex function satisfies the property that a line between any two points on its graph is on or above the graph: for any . It is…

diffusion process

References: http://www0.cs.ucl.ac.uk/staff/C.Archambeau/SDE_web/figs_files/ca07_RgIto_text.pdf https://www.ma.imperial.ac.uk/~pavl/lec_diff…

dual gradient ascent

TODO: flesh out theory, understand ADMM (e.g., https://www.cis.upenn.edu/~cis515/ws-book-IIb.pdf )

entropy

Measures uncertainty, disorder, or randomness. The (Shannon) entropy of a probability distribution is: The quantity inside the…

exponential family notes

Exponential Families, Conjugacy, Convexity, and Variational Inference Any parameterized family of probability densities that can be written…

filtration

A filtration is defined by monotonically increasing subsets of a [ probability space ]; that is, subsets such that we have for all…

fixed point

We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…

generative vs discriminative modeling

"What I cannot create, I do not understand". Related to: [ computational complexity ]: provers vs verifiers. [ P != NP ] [ production vs…

high-dimensional

ill-conditioned

Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…

importance sampling

Importance sampling allows us to compute expectations under a distribution using samples from a different distribution , by weighting the…

infinitesimal

The Leibniz calculus notation using infinitestimal quantities like or is simultaneously Very sensible and intuitive, but also Constantly…

kernel

multiple senses: in machine learning: positive definite (Mercer) kernels in linear algebra: kernel (nullspace) of a linear map in CS systems…

martingale

A martingale is any [ stochastic process ] that stays the same in expectation. Formally, is a martingale if This condition is related to…

matrix inversion lemma

The Woodbury-Morrison-Sherman matrix inversion lemma, is sometimes useful just for algebraic simplifications. In cases where and are…

matrix exponential

Reviewing this 3blue1brown video: https://www.youtube.com/watch?v=O85OWBJ2ayo The matrix exponential is written as E to the power of a…

measurable function

A function is measurable with respect to [ sigma-algebra ]s on its domain and on its range if the pre-image of any event is…

minimax duality

Considering a bilevel optimization problem (or saddle point problem) on the two-argument function , in general it holds that That is, the…

multivariate gaussian

We say that a random vector is multivariate Gaussian with mean and covariance matrix if it can be written where is a vector if i.i.d…

negligible

A negligible function is a function such that, for any positive integer there exists an integer such that for all , i.e., that…

optional stopping

If is a [ martingale ] and is a [ stopping time ], then any of the following conditions implies that : The stopping time is bounded…

predictable process

A [ stochastic process ] is predictable if its value at time is fully determined by information available at time . Any fully…

probability space

A probability space consists of: A set of outcomes aka possible worlds; these represent all the ways the world might be. This is the…

product of experts

Introduced by Geoff Hinton (1999): Products of Experts . Each expert produces a probability distribution. These are combined by…

proximal

Proximal methods in optimization The proximal operator of a [ convex ] function is defined as the minimizer of plus a distance penalty…

random variable

Formally, a random variable is a (measurable) function defined on outcomes from a [ probability space ] . That is, in any possible…

rate equation

The rate equation or master equation for a continuous-time Markov [ stochastic process ] describes how the probability density of the…

reverse diffusion

References: Ludwig Winkler's post on Reverse time stochastic differential equations . Suppose we have a [ stochastic differential equation…

score function

The score function is the gradient of a log-density with respect to its parameters: It is the direction that we would move the parameters…

stationary

A [ stochastic process ] is (strictly) stationary if all of its joint distributions are invariant under time displacement. It is wide…

stochastic differential equation

SDEs are typically written in terms of the differential of a Weiner process (Brownian motion), e.g., Although Weiner processes are nowhere…

stochastic process

A stochastic process is a collection of [ random variable ]s defined on a common [ probability space ] . Equivalently, it is a joint…

stopping time

A stopping time for a stochastic process is a time-valued That is, integer-valued for discrete-time processes and real-valued for…

tensor product

The tensor product of two vector spaces (defined on the same scalar field, we'll assume ) is the vector space of formal sums of…

tensor

Every in machine learning talks about tensors, but no one really understands what they are. This page collects several definitions and…

trace

Trace of a Linear Operator We define the trace as the sum of diagonal elements of a matrix: Lemma : If and are square, then . Proof…

transposes are measures

According to this reddit post , one of the main takeaways of functional analysis is that the right way to interpret the 'transpose' of a…

type theory

Inspired by Kevin Buzzard's overview of the state of automatic theorem provers. Type theory is like set theory in that sets and types are…

mcmc notes

Note: these are personal notes, taken as I was refreshing myself on this material. They're mostly stream of consciousness and probably not…

See All tags