Notes tagged with "machine-learning": Nonlinear Function

100 notes tagged with "machine-learning"

A Universal Law of Robustness via Isoperimetry

Link: A Universal Law of Robustness via Isoperimetry | OpenReview This paper purports to explain (and quantify) the observed fact that…

Tagged with: #papers#machine-learning

AI predictions

In the spirit of [ prediction as a model-building exercise ]. Language modeling: system writes publishable poetry: debatably already…

Tagged with: #ai#machine-learning

Bayesian learning rule

See https://emtiyaz.github.io/papers/learning_from_bayes.pdf Suppose we have a learning problem For some choice of exponential-family…

Tagged with: #math#machine-learning#bayes

Bregman divergence

For any strictly [ convex ] function , define the Bregman divergence: Examples: (Squared) Euclidean distance : choose the squared norm…

Tagged with: #machine-learning

Cramer-Rao bound

Related to [ natural gradient ] and the [ Fisher information ] matrix. Let's say we have a parametric model of some data. The Cramer-Rao…

Tagged with: #math#machine-learning

DALL-E

Generates images from captions by combining [ CLIP ]-style representation learning with a [ diffusion model ] model to construct images from…

Tagged with: #machine-learning

Gaussian process

Weight-Space View Recall standard linear regression. We suppose and where , where can be augmented with an implicit 1 term to allow a…

Tagged with: #math#machine-learning

LSTM

Good tutorial: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Tagged with: #machine-learning

LeCun's Cherry

Yann Lecun's famous cake analogy is that: "If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake…

Tagged with: #machine-learning#reinforcement-learning

Lagrange multiplier

We're given a [ constrained optimization ] problem Note that the standard formulation of Lagrange multipliers handles only equality…

Tagged with: #math#machine-learning

Neural message passing for Quantum Chemistry

Gilmer at al. paper 2017 Experiments on QM9. Unlike SMILES strings, includes molecular geometry . General formulation of message passing…

Tagged with: #chemistry#machine-learning#papers

Occam's razor

If two hypotheses are equally consistent with the data, the simpler is more likely to be 'true'. Formally, it is more likely to generalize…

Tagged with: #machine-learning

Pac-Bayes

I'm trying to build my understanding. These are fragments of intuitions. Bayesian inference starts with a prior P and a likelihood. Given…

Tagged with: #math#machine-learning#bayes

Transformer Papers

Massive list here: https://github.com/cedrickchee/awesome-bert-nlp Bahdanau, Cho, Bengio. Neural Machine Translation by Jointly Learning to…

Tagged with: #machine-learning#papers

algorithm

Fundamentally an algorithm is any computational procedure: something that takes in data and spits out some function of that data. Computer…

Tagged with: #math#machine-learning

approximate Bayesian inference

Tagged with: #machine-learning#math

automatic differentiation

This is my stab at explaining automatic differentiation, specifically backprop and applications to neural nets. A few dimensions to think…

Tagged with: #math#machine-learning

attention

One of the best ideas in machine learning. (I even thought so in 2011!) There are two common mechanisms: 'soft' and 'hard'. In both cases…

Tagged with: #machine-learning#ai#meditation#psychology#neuroscience

bias-variance tradeoff

I think of "variance" as the error in a statistical estimate that comes from not having enough data (assuming an [ identifiable ] model…

Tagged with: #machine-learning

calibration

Tagged with: #modeling#machine-learning

classification is special

The [ distinction ] between classification and regression is, from one point of view, arbitrary: it's all just function approximation, and…

Tagged with: #machine-learning

compression

Tagged with: #machine-learning#how-to-think

computation is important

Arguably the core insight of deep learning / [ differentiable program ]ming is that the shape and structure of the computations we do are so…

Tagged with: #machine-learning#ai

constrained optimization

Suppose we want to optimize an objective under some equality and/or inequality constraints, Some general classes of approach we can use are…

Tagged with: #math#machine-learning

continuous structure learning

Relevant papers: DIfferentiable compositional kernel learning for Gaussian Processes (Sun et al., 2018) Differentiable Architecture Search…

Tagged with: #papers#machine-learning#bayes

contrastive learning

A technique for [ representation ] learning in which semantically similar datapoints are encouraged to have similar representations, and…

Tagged with: #machine-learning

control variate

Tagged with: #machine-learning

contrastive divergence

A method for fitting an unnormalized probability density (aka [ energy-based model ]) to data. Note that this is a different and harder…

Tagged with: #machine-learning

cooperative inverse reinforcement learning

References: Cooperative Inverse Reinforcement Learning The Off-Switch Game Incorrigibility in the CIRL Framework The CIRL setting models…

Tagged with: #machine-learning#reinforcement-learning#alignment

data efficiency

Current (2021) deep networks require huge datasets in order to [ generalization|generalize ]. But we know that humans can do one-shot…

Tagged with: #machine-learning

decoding

References: Holtzman et al. (2020), The Curious Case of Neural Text Degeneration https://arxiv.org/abs/1904.09751 How should we actually…

Tagged with: #machine-learning

deep RL notes

Notes from John Schulman's Berkeley course on deep [ reinforcement learning ], Spring 2016. Value vs Policy-based learning Value-based…

Tagged with: #machine-learning#ai#reinforcement-learning

deep learning

see also: [ differentiable program ]

Tagged with: #machine-learning

differentiable program

Fast differentiable sorting and ranking: https://arxiv.org/abs/2002.08871 What are differentiable analogues of 'standard' programming…

Tagged with: #machine-learning

diffusion model

Diffusion models for image generation were independently invented at least twice: in a discrete-time variational inference framework…

Tagged with: #machine-learning

do-calculus

References: ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus , Causal Inference 2: Illustrating Interventions via a Toy…

Tagged with: #machine-learning#causality

double descent

Empirically, as model capacity increases past the memorization threshold ( ), [ generalization|generalization ] error starts decreasing…

Tagged with: #machine-learning

dual gradient ascent

TODO: flesh out theory, understand ADMM (e.g., https://www.cis.upenn.edu/~cis515/ws-book-IIb.pdf )

Tagged with: #math#machine-learning

energy-based model

Tagged with: #machine-learning#modeling

ensemble

Often we think of ensembles in the context of supervised learning: we have some algorithm that learns X -> y mappings, and by running it…

Tagged with: #machine-learning

exposure bias

Considering training an [ autoregressive ] model of sequence data (text, audio, action sequences in [ reinforcement learning ], etc.), which…

Tagged with: #machine-learning

expressive transformer

This note is a scratchpad for investigating the expressivity of the [ transformer ] architecture. In general, one set of intuitions that we…

Tagged with: #machine-learning#transformers

exponential family notes

Exponential Families, Conjugacy, Convexity, and Variational Inference Any parameterized family of probability densities that can be written…

Tagged with: #math#bayes#machine-learning

fast weights

On an evolutionary timescale, it's useful to evolve structures that can learn quickly. The nervous system is an evolved organ system for…

Tagged with: #ai#machine-learning

flexible model family

As AGW points out here , it is statistically better to fit a flexible model family, with an inductive bias, than a constrained model family…

Tagged with: #machine-learning#modeling

free lunch theorem

'[ no free lunch theorem ]' arguments are misleading because they consider the space of all possible functions. In fact, we usually care…

Tagged with: #machine-learning#fundamental

generalization

Fundamentally, where does generalization come from? [ causality ]: a model may generalize because it has discovered the true mechanism, or…

Tagged with: #machine-learning

generalized policy iteration

Sutton and Barto use this as a general term for any form of interleaving policy evaluation steps with policy improvement steps. This…

Tagged with: #ai#machine-learning

generative flow network

Many objects can be generated by a sequence of actions. For example: Generating language by adding one word at a time Generating a molecule…

Tagged with: #machine-learning

generative vs discriminative modeling

"What I cannot create, I do not understand". Related to: [ computational complexity ]: provers vs verifiers. [ P != NP ] [ production vs…

Tagged with: #math#fundamental#machine-learning

gradient clipping

Why do we clip gradients in deep learning? When is it important and what is the right way to do it? It seems like the standard recipe used…

Tagged with: #machine-learning

gradient of the log normalizer

For a normalized distribution , constructed from an (unnormalized) energy with normalizing constant as a function of parameters , in…

Tagged with: #machine-learning

graph neural networks

A 'graph neural net' is a differentiable, parameterized function whose input or output (or both) is a graph. Discriminative: graph as input…

Tagged with: #machine-learning

grounded

A nice observation from Percy Liang on the relationship between language modeling and grounded understanding: Just because you don't…

Tagged with: #ai#machine-learning

hard attention

Closely related to [ discrete latent variable ]s and to [ reinforcement learning ] with discrete actions. If I do a thing and it goes well…

Tagged with: #machine-learning

ill-conditioned

Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…

Tagged with: #math#machine-learning

important neural net phenomena

[ grokking ] / [ phase change hypothesis ] emergence of near-discrete features in large transformers symmetries / non-[ identifiable…

Tagged with: #machine-learning

inductive bias

Ways to specify inductive bias: Feature engineering Prior distribution acts as regularizer in MAP estimates Graphical model (constraint on…

Tagged with: #machine-learning

kernel

multiple senses: in machine learning: positive definite (Mercer) kernels in linear algebra: kernel (nullspace) of a linear map in CS systems…

Tagged with: #machine-learning#math

large models

If you believe that neural nets basically just memorize the training data, then training larger and larger models is hopeless. The…

Tagged with: #modeling#machine-learning

meta-level shape of machine learning

Unlike most modern [ deep learning ] systems, humans: don't have separate training/test phases (though we may have wake/[ sleep ]) don't…

Tagged with: #machine-learning

meta learning

Generally this means training some aspect of the learning procedure itself. There is then an inner-loop learning procedure, which follows…

Tagged with: #machine-learning

minimum description length

Short descriptions of things, when they exist, must capture some kind of structure. The principle of [ Occam's razor ] posits that we should…

Tagged with: #machine-learning#bayes

mirror descent

Mirror descent is a framework for optimization algorithms: many algorithms can be framed as mirror descent, and proofs about mirror descent…

Tagged with: #machine-learning

mode-covering variational inference is incoherent

I have a [ strong opinion weakly held ] that doesn't seem to be wildly shared in the [ approximate Bayesian inference ] community: reverse…

Tagged with: #machine-learning#bayes

most learning is by demonstration

In any human-to-human interaction, language carries some very important high-order bits, but it can only carry a few bits. It can help…

Tagged with: #teaching#machine-learning

multiplicative interaction

From a conversation I had about [ attention ] mechanisms in deep architectures. Maybe that terminology is too suggestive --- it's just a…

Tagged with: #ai#machine-learning

natural gradient

We don't typically think of it this way, but you can derive a [ gradient descent ] step as finding the point that minimizes a linearized…

Tagged with: #machine-learning

nested SMC

Christian Naesseth, Fredrik Lindsten, Thomas Schon (2015): http://proceedings.mlr.press/v37/naesseth15.html The main idea: In an SMC…

Tagged with: #machine-learning#papers

neural nets do work

Like the proverbial half-full glass, smart people can look at the same reality of the current capacities of neural nets, and come to…

Tagged with: #machine-learning#personal

no free lunch theorem

The folklore no-free-lunch 'theorem' in machine learning says that, for any pair of learning algorithms, there exists some dataset on which…

Tagged with: #machine-learning

noisy natural gradient as VI

https://arxiv.org/abs/1712.02390 Basic idea: optimizers like Adam and RMSProp already keep track of posterior curvature estimates. These are…

Tagged with: #machine-learning#papers

perceiver

reading the perceiver papers from Deepmind: Perceiver: Jaegle et al 2021 https://arxiv.org/abs/2103.03206 Perceiver-IO: Jaegle et al 202…

Tagged with: #ai#machine-learning

phase change hypothesis

(see also: [ large models ]) There's a viewpoint that neural nets just memorize the training data, so the more training data you have, the…

Tagged with: #machine-learning#modeling

privacy

It seems like there is, or can be, a virtuous relationship between privacy and generalization. You don't want to memorize too many…

Tagged with: #machine-learning

probabilistic program induction

Can we think about [ generative flow network ]s as a potentially tractable formulation of probabilistic program induction?! executing a line…

Tagged with: #machine-learning#ai

probabilistic programming is not AI research

Many [ probabilistic programming ] researchers frame their work as part of the broader problem of [ artificial intelligence ]. Artificial…

Tagged with: #machine-learning#bayes

probabilistic programming

Tagged with: #machine-learning#modeling

probabilistic transformers

A short note on interpreting a transformer layer as performing maximum-likelihood inference in a Gaussian mixture model: https://arxiv.org…

Tagged with: #papers#machine-learning

product of experts

Introduced by Geoff Hinton (1999): Products of Experts . Each expert produces a probability distribution. These are combined by…

Tagged with: #math#machine-learning#modeling

pushforward natural gradient

It's tempting to use [ natural gradient ] ascent to optimize a variational distribution. We could also consider using it to optimize the…

Tagged with: #machine-learning

reinforcement learning

Note : see [ reinforcement learning notation ] for a guide to the notation I'm attempting to use through my RL notes. Three paradigmatic…

Tagged with: #ai#machine-learning#reinforcement-learning

relu selection

The selection operation y = where(c, a, b) returns How can a [ transformer ] layer implement this operation? One approach is to is to use…

Tagged with: #machine-learning

relu inequality

Suppose we want a [ transformer ] to evaluate the inequality returning if and otherwise. For integer , this can be done with a…

Tagged with: #machine-learning#transformers

replica trick

If a model with data has normalizing constant , then the replica trick says that This allows us to analyze the average log-normalizer…

Tagged with: #machine-learning

representation

In modern ML, representation learning is the art of trying to find useful abstractions, embodied as encoding networks. We can learn…

Tagged with: #machine-learning

scheduled sampling

Scheduled sampling is a training procedure for sequence models that attempts to mitigate [ exposure bias ] - the problem in which generation…

Tagged with: #machine-learning

sparse mixture of experts

References: Jacobs, Jordan, Nowlan, Hinton. Adaptive Mixtures of Local Experts (1991) Shazeer et al. Outrageously Large Neural Networks…

Tagged with: #machine-learning#transformers

structured prediction

In kindergarten stats, you learn how to build a model that takes in data (a feature vector, image, sound file, etc) and predicts a single…

Tagged with: #machine-learning

superposition

A -dimensional vector can represent distinct orthogonal features, but due to the weirdness of [ high-dimension ]al geometry, it can…

Tagged with: #machine-learning

teacher forcing

Something that confused for me for a while is that people in certain communities talk about 'teacher forcing' as though it's a trick or a…

Tagged with: #machine-learning

teaching machine learning

Rob wants to firm up his foundations. He wants to understand relevant stats, probabilistic models, inference, and maybe work our way up to…

Tagged with: #teaching#machine-learning

tensor

Every in machine learning talks about tensors, but no one really understands what they are. This page collects several definitions and…

Tagged with: #math#machine-learning

tokenize

How should a machine learning model represent text? Word-level and character-level features are obvious options, but both have drawbacks…

Tagged with: #machine-learning#transformers

training for consistency

These days we think a lot about using data to train large [ language model ]s. But there's only so much data in the world; eventually we'll…

Tagged with: #ai#machine-learning#how-to-think

transformer primatives

In developing intuition about [ transformer ]s it's useful to think about specific primitive operations that can be implemented by a small…

Tagged with: #machine-learning#transformers

transformer

The core of the transformer architecture is multi-headed [ attention ]. The transformer block consists of a multi-headed attention layer…

Tagged with: #ai#machine-learning#transformers

variational inference

How should people do VI? One ultimate goal is that you write a Stan model (or better, a model with discrete variables, but one step at a…

Tagged with: #machine-learning#bayes

variational optimization

Holy shit. In December on Galiano I was brainstorming about [ continuous structure learning ] and thought of the general trick, for…

Tagged with: #machine-learning

mcmc notes

Note: these are personal notes, taken as I was refreshing myself on this material. They're mostly stream of consciousness and probably not…

Tagged with: #math#machine-learning#bayes

See All tags