All Notes: Nonlinear Function

All Notes

being good is a privilege

Imagine a really good person: someone whom everyone likes, is warm and friendly towards everyone they meet, gives freely of themselves, has…

being yourself takes practice

Related to: [ generative vs discriminative modeling ] The difficulty of [ learning new skills ] I remember learning to play the violin when…

believe in something

I used to believe in AI and machine learning. It was obvious to me as a student that the ability to create intelligent machines will be a…

benzodiazepines

berkeley eecs graduate admissions

This year (2014-15) I served as a student reader on the Berkeley EECS PhD admissions committee. Berkeley typically gets about N…

best explanations

lagrange multipliers: www.eecs.berkeley.edu/~klein/papers/lagrange-multipliers.pdf legendre transfors / convex duality: http://student…

best of all possible worlds

bias-variance tradeoff

I think of "variance" as the error in a statistical estimate that comes from not having enough data (assuming an [ identifiable ] model…

bids

The Atlantic has this article on "Masters of Love" http://www.theatlantic.com/health/archive/2014/06/happily-ever-after/372573/ presenting…

biggest problems in AI

I think this might be the wrong way to frame this. The field is CS, and the biggest problem in CS is AI. Within AI, the problems are…

bitter lesson

http://www.incompleteideas.net/IncIdeas/BitterLesson.html The bitter lesson is based on the historical observations that 1) AI researchers…

blockchain

blog posts to write

[ writing inbox ] the privilege of having correct advice work out for you ([ the privilege of advice working out ]) Let's stipulate that any…

blur your eyes

Jitendra Malik would often tell us that some idea or explanation makes sense if you 'blur your eyes'. This seems counterintuitive, but it's…

body practice

Possible resources on tai chi, qi gong Paul Lam beginner video: https://www.youtube.com/watch?v=rbFzbS2lLT0 if it's good, can pay for his…

book goals

[ meditation ] and dharma: The Buddha's Own Words Mastering the Core Teachings of the Buddha Science of Enlightenment Seeing that Frees…

books I recommend

Fiction memoirs of hadrian shogun name of the wind Cloud Atlas gay: Giovanni's Room sci-fi: nexus a fire upon the deep / a deepness in the…

bounded cognition

brain

Sources: wikipedia GPT-4 https://ocw.mit.edu/courses/9-13-the-human-brain-spring-2019/video_galleries/lecture-videos/ Major parts of the…

bupropion

Antidepressent that acts as a [ norepinephrine ]-[ dopamine ] reuptake inhibitor. In the class of substituted [ phenethylamine ]s and (more…

calibration

cannabinoid

Endocannabinoids are retrograde [ neurotransmitter ]s, meaning that they pass 'backwards' (from dendrite to axon) through the synaptic cleft…

capabilities research

In the discourse around [ AI safety ] you sometimes see the claim that research on AI capabilities is harmful to the extent that it outpaces…

carbonyl

carbonyl group

care about my work enough to want to get better

The problem with work that's 'just a job' is that you'll never be as good at it as at work that really excites you, where you really want…

cargo-culting

carve the world at the joints

casual love

aka 'little love' or 'big love' or various other things Saved from https://www.carsieblanton.com/blog/post/82149148832/casual-love This…

catastrophic forgetting

cation

causal graph

causal inference

How do we infer [ causality ] from observational data? This question is important in science and is closely related to the progress of…

causality

see [ causal inference ] Great Causality & ML Papers and Researchers – Blog (logangraham.xyz)

cellular respiration

We can view cellular respiration (and combustion more generally) from a high level as the transfer of electrons from carbon to oxygen atoms…

central bank

cessation of suffering

see: https://www.abolitionist.com/ https://qualiacomputing.com/2018/11/07/anti-tolerance-drugs/ [ suffering ] may seem inevitable. [ karma…

ceteris paribus

Other things held equal. Abstract reasoning is good for coming to 'ceteris paribus' conclusions. It's easy to identify one force acting in…

chain of thought

chain rule

There are two major 'chain rules' relevant to machine learning: the chain rule of probability theory and the chain rule from calculus…

change-signaling event

Say you want to make a big personal change: to stop smoking, or to stop eating meat, or to meditate every day, or introduce yourself to a…

choicefulness

I got this concept from NameRedacted. People overestimate what they can do in a week, but underestimate what they can do in ten years. To do…

citric acid cycle

Also known as the Krebs cycle. Is the final common pathway for oxidation ('burning') of fuel molecules: carbs, fats, proteins. The fuel…

classic papers

More Is Different (kit.edu) On proof and progress in mathematics (Thurston, 1994) On Being the Right Size (Haldane, 1928))

classification is special

The [ distinction ] between classification and regression is, from one point of view, arbitrary: it's all just function approximation, and…

coarse-to-fine

Dan Klein once said I don't remember the context, but I took his NLP class in spring 2011 so it was probably around then. that he thought…

cognitive structure

[ conceptual scaffolding ]

cognitive technology

come alive

"find the thing that makes you come alive, and do that." I find myself thinking about 'giving up' in the sense of dreams of being an…

coming out

commitment scheme

A commitment scheme allows for one party to publicly commit to some value without revealing that value. For example, Alice wants to bet on…

common knowledge

communication is processing

Talking and writing are not nearly as much about communication as we think. That's part of it, of course. But a significant portion, often…

complacent

complete the square

Multivariate Completion of Squares A useful trick: if is a symmetric, nonsingular matrix, then This is easy to see just by expanding out…

compositional natural gradient

TO READ: Kronecker factored Approximate Curvature : K-FAC Practical Gauss-Newton : gives recursions for computing Hessian blocks, and…

compression

computation is important

Arguably the core insight of deep learning / [ differentiable program ]ming is that the shape and structure of the computations we do are so…

computational complexity

computational functionalism

The view that performing the "right kind of computation" is necessary and sufficient for [ consciousness ]. Chalmers (1995): if a person's…

computational graph

computational lens notes

**from “theory of computation as a lens on the sciences” ehud kalai talks about structural robustness in games: an equilibrium is robust if…

computational life coach

How do you start building and selling [ computational therapy ]? It can't just be a medical product, because that's a hugely regulated and…

computational therapy

See also: [ computational life coach ] A recurring dream I have is to use AI to solve mental health. It is simultaneously one of the most…

concentration

Concentration is something of a misnomer for the meditative practice and states of samādhi . The word "concentration" carries connotations…

concentration inequalities

conceptual chunks

conceptual scaffolding

Related: Learning increasingly complex ideas may amount to forming larger effective chunk sizes Expertise requires increasingly…

confidence all the way up

Nate Soares wrote an essay a while ago on how he experiences [ self-confidence ] when opining on difficult topics: On reflection, I've…

confidence is about accepting failure

https://markmanson.net/how-to-be-confident see also [ imposter syndrome ] and the thesis that a lack of confidence is often an aversion to…

connections between DDPG and Bayesian optimization

consciousness

Philosophical views on consciousness: Buddhist and meditative traditions focus on [ awareness ]. They claim that consciousness has nothing…

conspiracy is a thing for a reason

Multiple people working together are fundamentally more powerful than a single person working alone. Governments recognize this fact, which…

constrained optimization

Suppose we want to optimize an objective under some equality and/or inequality constraints, Some general classes of approach we can use are…

constraints can be good

All else equal, constraints prevent you from doing what you'd have otherwise wanted to do, which is bad. But. Constraints prevent [ analysis…

continuous structure learning

Relevant papers: DIfferentiable compositional kernel learning for Gaussian Processes (Sun et al., 2018) Differentiable Architecture Search…

contraction

A contraction mapping on a metric space is a function such that for all and for some , called the [ Lipschitz ] constant of the map…

contrastive divergence

A method for fitting an unnormalized probability density (aka [ energy-based model ]) to data. Note that this is a different and harder…

contrastive learning

A technique for [ representation ] learning in which semantically similar datapoints are encouraged to have similar representations, and…

control variate

conversation as a game

Okay so there’s a lot of research on what conversations are, what the goals are (of course I don’t know most of this research…). It seems as…

convex

A convex function satisfies the property that a line between any two points on its graph is on or above the graph: for any . It is…

convex dual

See also: https://www2.sonycsl.co.jp/person/nielsen/Note-LegendreTransformation.pdf Jess Riedel on the Legendre transform in physics looks…

convolution

cooking technique

Order of operations When should you add oil when preparing to sauté? Generally the answer is "after preheating the pan, unless it's nonstick…

cooperative game

References: https://en.wikipedia.org/wiki/Cooperative_game_theory https://en.wikipedia.org/wiki/Shapley_value conversations with GPT-4 A…

cooperative inverse reinforcement learning

References: Cooperative Inverse Reinforcement Learning The Off-Switch Game Incorrigibility in the CIRL Framework The CIRL setting models…

cortex

counterfactual

Level 3 of Pearl's [ causal inference ] hierarchy: questions of the form 'given that (X, Y) happened, what would have happened if (X', Y…

countries I've been to

USA (45/50 states: all but N Dakota, S Dakota, Minnesota, Lousiana, Mississippi) Canada Ecuador Costa Rica Argentina England Scotland France…

creatine

In the cell, creatine is stored as phosphocreatine. It acts like a 'backup' adenosine: Phosphocreatine can can donate its [ phosphate…

credit assignment

crispr-cas9

"Crispr" == "Clustered regularly interspaced palindromic repeats". These are DNA sequences in bacteria that represent a genetic 'memory' of…

cumulative distribution function

curiosity

current identity, goals, and plans

This is an index of pages that reflect things I'd like to do, or that should otherwise should stay updated to reflect my current thinking…

damn rockstar, always

Someone on tinder had this phrase in their bio. As a life motto I think it's pretty powerful. It's a short pneumonic for 'fuck the world…

data efficiency

Current (2021) deep networks require huge datasets in order to [ generalization|generalize ]. But we know that humans can do one-shot…

datacenter

How much compute does a typical Google data center have? Where would it fall on the supercomputer rankings? The top computer on the TOP50…

deceptive alignment

The idea is that a [ mesa optimizer|mesa-optimizing ] policy with access to sufficient information about the world (e.g., web search) might…

decision transformer

paper: Chen, Lu, et al. 2021, https://arxiv.org/abs/2106.01345 Trajectories are represented as sequences: where is the return-to-go, i.e…

declarative and procedural knowledge

See also [ generative vs discriminative modeling ]

decoding

References: Holtzman et al. (2020), The Curious Case of Neural Text Degeneration https://arxiv.org/abs/1904.09751 How should we actually…

deconstructing sensory experience

A simple lens on meditative insight progress, from Michael Taft : Start with some sensory object - the sight of a tree, or the felt…

deep RL notes

Notes from John Schulman's Berkeley course on deep [ reinforcement learning ], Spring 2016. Value vs Policy-based learning Value-based…

deep deterministic policy gradient

Deep deterministic policy gradient (DDPG) is an interesting RL algorithm with a somewhat misleading name. Although its name indicates that…

deep learning

deep understanding

Which is more useful: reading the New York Times every day, or reading John Stuart Mill? listening to a current-events podcast, vs listening…

default mode network

A set of connected brain regions that are active when you're 'at rest', not focused on the external world. This includes mental states such…

defensive tech

Vitalik Buterin argues for defensive accelerationism (d/acc): One frame to think about the macro consequences of technology is to look at…

delayed sampling

Like quantum mechanics! We build up a distribution over variables defined so far. When we need to use a value, we sample from this…

depression

Depression is the worst thing . Why? Ultimately we care about global utility. Depression is literally the state of finding it difficult or…

dev tools

Google-internal equivalents: Dev tools: The ex-Googler guide (sourcegraph.com) GitHub - jhuangtw/xg2xg: by ex-googlers, for ex-googlers - a…

developing taste

The [ hedonic treadmill ] manifests in taste for things like wine, beer, coffee, fine cuisine. I've never spent effort refining my taste in…

differentiable environments

Maybe a stupid idea, but I wonder if the idea behind differentiable physics simulators (like Brax) can be extended more broadly to rich…

differentiable program

Fast differentiable sorting and ranking: https://arxiv.org/abs/2002.08871 What are differentiable analogues of 'standard' programming…

diffusion model

Diffusion models for image generation were independently invented at least twice: in a discrete-time variational inference framework…

diffusion process

References: http://www0.cs.ucl.ac.uk/staff/C.Archambeau/SDE_web/figs_files/ca07_RgIto_text.pdf https://www.ma.imperial.ac.uk/~pavl/lec_diff…

dihydromyricetin

direct preference optimization

References: Direct Preference Optimization: Your Language Model is Secretly a Reward Model This seems like a compelling reframing of…

directions for probabilistic programming

discount rate

discrete latent variable

distinction

[ Otter notes ] August 2020: When somebody says that X is good---here X could be love relationships, money, peace, or whatever---it is never…

distributional RL

There are two forms of uncertainty in value-based [ reinforcement learning ]. Let be the return from trajectory , and be the expected…

diversification

Perhaps the only free lunch in [ finance ]. Given N investments all with the same expected value and level of risk (variance), whose…

do-calculus

References: ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus , Causal Inference 2: Illustrating Interventions via a Toy…

doing things yourself avoids cargo-culting

What does it mean to 'be yourself'? Selves aren't a well defined thing. Ultimately everything about your self is shaped by your surroundings…

dopamine

Chemically, dopamine is a benzene ring (aka phenyl group), with two hydroxyl (OH) groups at adjacent sites, and a two-carbon (ethyl) chain…

double descent

Empirically, as model capacity increases past the memorization threshold ( ), [ generalization|generalization ] error starts decreasing…

drugs

[ nootropics ] [ psychedelic ]

dual gradient ascent

TODO: flesh out theory, understand ADMM (e.g., https://www.cis.upenn.edu/~cis515/ws-book-IIb.pdf )

dual metareasoning

From a conversation about [ attention ], [ multiplicative interaction ], and [ meta-reasoning ]: at some level, a lot of the AI problem…

dual-process cognition

Elephant and rider Asking "what should I value?" is asking the rider. it demands solving impossible moral questions. Asking "what do I value…

duality gap

dukkha

In Pāli, dukkha refers to the unsatisfactory parts of existence. Literally it refers to a wheel in which the axel hole is not centered…

dullness

The fifth Buddhist precept is to refrain from intoxicants. Historically this means no alcohol. But a modern interpretation From NameRedacted…

ego death

eight-fold path

Academic components: Right view (right understanding): seeing reality as it really is and understanding the [ four noble truths ]. This…

electronegative

elephant path

eligibility trace

A few ways to think about eligibility traces: an explicit accounting of credit assignment a [ sufficient statistic ] for the history of the…

embedded agent

Notes on Abram Demski and Scott Garrabrant's sequence on Embedded Agency Embedded Agents : Classic models of rational [ agency ], such as…

emergent capabilities

A consequence of [ phase transition ]s in [ large models ] is that models may end up having capabilities we didn't expect. For example…

emotional labor

Relationships and community and mental health are not automatically maintained, and maintaining them doesn't come for free. Comforting…

emptiness

I think [ Dan Brown ] said somewhere that a good synonym for 'empty' in meditative contexts is 'mere construction'. For example, practicing…

empty

See [ emptiness ].

enabling environment

Andy Matuschak's concept of an Enabling Environment gets at something I've had in my mind but not named. It's an environment that expands…

energy-based model

enlightenment

David Chapman suggests that enlightenment in Buddhism is not a single defined thing, 'the word is hopelessly confused': https://vividness…

enlightenment can't play chess

Very smart people tend to disbelieve in [ enlightenment ] because they hold up unrealistic notions of what it is or what it entails. There…

ensemble

Often we think of ensembles in the context of supervised learning: we have some algorithm that learns X -> y mappings, and by running it…

entropy

Measures uncertainty, disorder, or randomness. The (Shannon) entropy of a probability distribution is: The quantity inside the…

epinephrine

epistemic uncertainty

equanimity

essays to reread

Innerring (C. S. Lewis). In any institution there are unofficial circles of influence: people who are 'on the inside' and people who are…

ethanol

Probably the simplest and smallest molecule used as a psychoactive drug: It is neurotoxic, carcinogenic, and addictive, but of course…

ether

The currency of [ Ethereum ]. Why does Ether have value? It represents computing time on a shared global computer. The more Ether you have…

eudaemonic

eurodollar

Foreign banks can create dollar-denominated liabilities much larger than their reserve of actual dollars, without the need to adhere to US…

evergreen notes

[ evergreen notes ] are a concept from Andy Matuschak. They're a framework for thinking about writing, note-taking, and intellectual…

every branch has high-value leaves

Events that seem really terrible---closing off good outcomes and potentially leading to bad outcomes---often refine into a fine path that…

every method is a trap

"In other words, there's a method of pursuing or following a devotion to the Guru, but ultimately every method is a trap, and you've got to…

experience replay

The state transitions we observe in [ reinforcement learning ] are typically correlated over time, both within a trajectory (obviously) and…

explaining away

explicit models of uncertainty

(note: this is dancing around the issues around why I think [ probabilistic programming is not AI research ], even if it will be a…

exploration

exploration versus exploitation

explore

Quit Your Job (palladiummag.com) : Productive exploration requires the application of skilled personal judgment to chasing hunches and…

exponential family

exponential family notes

Exponential Families, Conjugacy, Convexity, and Variational Inference Any parameterized family of probability densities that can be written…

exposure bias

Considering training an [ autoregressive ] model of sequence data (text, audio, action sequences in [ reinforcement learning ], etc.), which…

expressive transformer

This note is a scratchpad for investigating the expressivity of the [ transformer ] architecture. In general, one set of intuitions that we…

fabrication

factors of awakening

In [ Dan Brown ]'s telling, these are: mindfulness ( sati ( https://en.wikipedia.org/wiki/Mindfulness (Buddhism))_): paying attention to the…

failure as a temporary setback

faith

I grew up in the 2000s reading the New Atheists, where 'faith' was considered a dirty word. Faith was the opposite of reason; it meant…

family recipes

MACARONI AND CHEESE 4 Tb margarine 5 tb flour Milk 2 tsp mustard 1 garlic clove, minced or 1 tb prepared minced garlic 2 ½ cups sharp…

fashion

Ultimately what is attractive in fashion is confidence. you can break almost all of the rules if it's clear that you're doing it as a matter…

fashion is like sex

I've thought before that fashion is bad because it's about arbitrary trends. But you can also see fashion as good because it's about…

fast weights

On an evolutionary timescale, it's useful to evolve structures that can learn quickly. The nervous system is an evolved organ system for…

feedback loop

See also Scott Alexander's Ontology Of Psychiatric Conditions: Dynamical Systems - Astral Codex Ten (substack.com) theory of depression.

feminism

filtration

A filtration is defined by monotonically increasing subsets of a [ probability space ]; that is, subsets such that we have for all…

finance

Yale course: https://oyc.yale.edu/economics/econ-251 MIT course: https://www.youtube.com/playlist?list=PLUl4u3cNGP63B2lDhyKOsImI7FjCf6eDW…

find time to play

From a Jean Yang tweet : I once attended a talk by the late Nobel Laureate Oliver Smithies where he talked about going into lab on weekends…

fine-tuning

five hindrances

Buddists identify five factors as obstacles to [ concentration ] in [ meditation ]: Sensory desire ( kāmacchanda ) Aversion or ill will (…

fixation

Interesting and seemingly very powerful perspective on the [ cessation of suffering ]. Most refs on this page are from this twitter thread…

fixed point

We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…

flexible model family

As AGW points out here , it is statistically better to fit a flexible model family, with an inductive bias, than a constrained model family…

focus on what you want to see more of

Credit to NameRedacted for this refrain https://twitter.com/visakanv/status/1324978566455468035/retweets/with_comments It's a powerful take…

forer effect

Statements from Forer's experiment : You have a great need for other people to like and admire you. You have a tendency to be critical of…

forgive

A key insight of Christianity is that forgiveness is something we do for ourselves : it's not just about extending [ grace ] to the party…

foundation model

four immeasurables

Aka the four 'divine abodes' or Brahma-viharas: [ loving-kindness ] (metta): active good will towards others compassion: empathizing with…

four noble truths

These are the first teaching of the Buddha, after he achieved [ enlightenment ] while [ meditation|meditating ] under a tree. The truths are…

fractional reserve banking

Banks create money by lending. Few understand this. Alice and Bob are on a desert island. Alice has $100, which she deposits in the Desert…

free base

Some drugs, like cocaine, or DMT, come in multiple forms: as some sort of [ salt ] or as a 'free base'. What's the difference between these…