All Notes: Nonlinear Function

All Notes

metta

Tagged with: No Tags

middle way

From @visakanv on Twitter: (relevant to [ nothing matters ])

Tagged with: #buddhism

mind at large

Tagged with: No Tags

mindfulness requires certainty

A lesson from NameRedacted: unresolved questions are the worst thing in meditation. For example, you're just sitting down to practice when…

Tagged with: #meditation

minimax duality

Considering a bilevel optimization problem (or saddle point problem) on the two-argument function , in general it holds that That is, the…

Tagged with: #math

minimum description length

Short descriptions of things, when they exist, must capture some kind of structure. The principle of [ Occam's razor ] posits that we should…

Tagged with: #machine-learning#bayes

mirror descent

Mirror descent is a framework for optimization algorithms: many algorithms can be framed as mirror descent, and proofs about mirror descent…

Tagged with: #machine-learning

mirror descent implementations

What pieces of [ mirror descent ] can we automate? See also [ natural gradient implementations ] Given a mirror function , we can compute…

Tagged with:

mirror neurons

Tagged with:

mission statement

(originally from 2020-04-29) On another note, last night I tried to dictate (on Otter) my sense of my life goals. I came up with a very…

Tagged with: #personal

mitochondria

Tagged with:

mixed effects

[ Otter notes ]: Can I explain what a mixed effects model is from a graphical model standpoint? On the inference side, I think it's just…

Tagged with:

mixture of experts

A mixture-of-experts model consists of a set of functions , the 'experts', and a gating function that determines how to select which…

Tagged with: No Tags

mode-covering variational inference is incoherent

I have a [ strong opinion weakly held ] that doesn't seem to be wildly shared in the [ approximate Bayesian inference ] community: reverse…

Tagged with: #machine-learning#bayes

model-agnostic meta learning

Original paper: Finn, Abbeel, and Levine, ICML 2017, https://arxiv.org/abs/1703.03400 An approach for [ meta learning ] that works with any…

Tagged with:

model-based RL

Tagged with: No Tags

model-based rl

Often we don't explicitly use 'model-based RL' methods, instead people in robotics talk about Sim2Real: adapting a policy pretrained in a…

Tagged with:

molecular dynamics

Stack: goal: sample from conformations of arbitrary hydrocarbons (or whatever). simpler goal: sample from conformations of ethane. simpler…

Tagged with: #physics#ideas

money supply

Naively you might think that the government just decides how many dollars there should be, and that's that. This is not true. Since [ IOUs…

Tagged with: #finance

monoamine oxidase

A monoamine oxidase (MAO) is an enzyme that breaks down mono-[ amine ] neurotransmitters such as [ dopamine ], [ serotonin…

Tagged with: #drugs#organic-chemistry

monte carlo tree search

A very natural form of [ meta-reasoning ] that selects the most promising computations. The simplest form of 'expanding' a node assumes a…

Tagged with: #ai

moral realism

There is a connection between moral realism and belief in [ qualia ]. If you see "experience" ([ awareness ]) as a real, fundamental aspect…

Tagged with: No Tags

morning person

I don't hold the moral view that it's better to be a morning person than an evening person. Having always tended towards a later sleep…

Tagged with: #personal#life-advice

most learning is by demonstration

In any human-to-human interaction, language carries some very important high-order bits, but it can only carry a few bits. It can help…

Tagged with: #teaching#machine-learning

most people don't care

This is one of the big problems with the world. Not the only one, and not the only way to look at it. But it's everywhere. status : a…

Tagged with: #how-to-think#mental-health

most work is bullshit

(see David Graeber https://www.strike.coop/bullshit-jobs/ ) Most work is oriented towards achieving [ instrumental goal ]s. But most…

Tagged with: #how-to-think

motorbike tips

maneuvering: the bike goes where I look. look around the turn I want to do. keep elbows up. shift body weight to counterbalance the bike. E…

Tagged with:

multimodal transformer

possible refs: google's multimodal architectures: https://webcache.googleusercontent.com/search?q=cache:https://towardsdatascience.com…

Tagged with:

multiplicative interaction

From a conversation I had about [ attention ] mechanisms in deep architectures. Maybe that terminology is too suggestive --- it's just a…

Tagged with: #ai#machine-learning

multivariate gaussian

We say that a random vector is multivariate Gaussian with mean and covariance matrix if it can be written where is a vector if i.i.d…

Tagged with: #math

multivariate time series

[ thoughts on multivariate causalimpact ]

Tagged with: #modeling#bayes

mutually orthogonal communities

This was originally a section of breakup.org, written several years ago. this is more related to jobs and identity, but for cases when I get…

Tagged with:

my goals

I want to intentionally spend my time well. I remember back in grad school I would spend evenings reading papers, just as a form of growth…

Tagged with: #personal

my relationship with tech

I've identified as a 'tech' person, but I now feel uncomfortable in many tech circles. What is tech and what does it mean to be a tech…

Tagged with: #personal

my values

It's a useful exercise to occasionally reflect on what I value. stab 1: Generally pro tech, creating new things, non-zero-sum contributions…

Tagged with: #personal

myelin

Tagged with: No Tags

nasty, brutish, and short

Tagged with:

nattokinase

Recommended by Michael Edward Johnson:

Tagged with:

natural abstraction

A 'natural' abstraction is one that we expect any agent (or at least, a wide range of agents) to develop because it gets at something…

Tagged with:

natural experiment

Tagged with: No Tags

natural gradient

We don't typically think of it this way, but you can derive a [ gradient descent ] step as finding the point that minimizes a linearized…

Tagged with: #machine-learning

natural gradient implementations

How can we automate [ natural gradient ]? See also [ mirror descent implementations ]

Tagged with:

nearest neighbor

Cool trick: some applications can improve on nearest-neighbor lookup by training 'Exemplar SVM's. Instead of matching against a set of…

Tagged with:

negative utilitarianism

Tagged with: No Tags

negative utility

My position (a [ strong opinion weakly held ]) is that global utility is currently negative, and probably always has been. It's conceivable…

Tagged with: #morality

negligible

A negligible function is a function such that, for any positive integer there exists an integer such that for all , i.e., that…

Tagged with: #math#crypto

nested SMC

Christian Naesseth, Fredrik Lindsten, Thomas Schon (2015): http://proceedings.mlr.press/v37/naesseth15.html The main idea: In an SMC…

Tagged with: #machine-learning#papers

neural nets do work

Like the proverbial half-full glass, smart people can look at the same reality of the current capacities of neural nets, and come to…

Tagged with: #machine-learning#personal

neural nets don't just interpolate

Sometimes you'll see people say that neural nets 'just' memorize and interpolate their training data. No one denies that neural nets with…

Tagged with: No Tags

neuron

Parts of a neuron: dendrites: these branch out to receive connections from other cells axons: these branch out to send signals to other…

Tagged with: #biology#neuroscience

neurotransmitter

Tagged with: No Tags

nihilism

Tagged with:

no free lunch theorem

The folklore no-free-lunch 'theorem' in machine learning says that, for any pair of learning algorithms, there exists some dataset on which…

Tagged with: #machine-learning

no plan survives contact with the enemy

Tagged with: No Tags

no-self

No-self is one of the [ three characteristics ] that traditional Buddhism holds are present in all phenomena. In later Buddhism, the…

Tagged with: #buddhism#meditation#drugs

noisy natural gradient as VI

https://arxiv.org/abs/1712.02390 Basic idea: optimizers like Adam and RMSProp already keep track of posterior curvature estimates. These are…

Tagged with: #machine-learning#papers

nominal GDP target

Instead of directly targeting a specific rate of inflation, a [ central bank ] may target a fixed rate of nominal GDP growth, which is equal…

Tagged with: No Tags

non-dominating force

One way to model real-world [ causality ] is a bunch of forces working with and against each other. In this view, no individual force…

Tagged with: #how-to-think

non-fungible token

NFTs 101: https://medium.com/@intenex/nfts-101-why-nfts-are-a-generational-innovation-4626ae803e3b Among many other things, NFTs are…

Tagged with: #finance

non-player character

Tagged with: No Tags

nondual

Tagged with: No Tags

nootropics

Obligatory disclaimer: there will never be a drug to turn you into Einstein. Most of effective high-level thinking lies in 'software…

Tagged with: #neuroscience

norepinephrine

Tagged with: #neuroscience

normalized advantage function

References: Gu et al., Continuous Deep Q-Learning with Model-based Acceleration (2016). Instead of modeling directly, we build a network…

Tagged with: #reinforcement-learning

not true enough

Something can be true but not 'true enough'. That is, you have a compelling causal theory for why X should increase Y. It might be that the…

Tagged with: #how-to-think

notes on Hamming

I've started reading The Art of Doing Science and Engineering by Richard Hamming. History of computing: Analog computing goes back forever…

Tagged with:

nothing matters

Because: [ goals are arbitrary ]: achieving a goal, or failing to, doesn't really matter because the goal was arbitrary anyway. From the…

Tagged with: #morality

nothing to do

There's a spiritual idea, in Buddhism and elsewhere, that there is "nothing to do": everything is already suffused with "primordial…

Tagged with: #buddhism#meditation

nucleophile

Tagged with: #chemistry

nucleotide

Tagged with: #biology#chemistry

nucleus sampling

Tagged with: No Tags

numerics

Don't invert that matrix: https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ Seven sins of numerical linear algebra…

Tagged with: #computer-science

objectives are big

A very incomplete and maybe nonsensical intuition I want to explore. Classically, people talk about very simple [ reward ] functions like…

Tagged with: #ai#reinforcement-learning#alignment

off-policy

A few (relatively uninformed) thoughts about on- vs off-policy [ reinforcement learning ]. Advantages of on-policy learning: On-policy…

Tagged with: #reinforcement-learning

old daily templates

Original: Daily reflections What am I grateful for today?:: Some goals : Goals for the next ~year:: Goals for the next ~month:: Goals for…

Tagged with:

on-policy learning

Tagged with: No Tags

one taste

The brain doesn't have separate models of each of the [ sense gate ]s (and thought). Instead it just stores each moment of perception as a…

Tagged with: No Tags

one-way function

Informally, a function is a one-way function if it is easy to compute but hard to invert. Or more generally, hard to pseudo-invert, i.e…

Tagged with: #crypto

ongoing projects

These are things that I might plausibly decide I want to work on when I sit down on the weekend. Expanding nodes on this graph. Blogging…

Tagged with: #personal#ideas

ontological crisis

How do we maintain values when our models of the world shift? If someone's goal in life is to "do God's will", and then they come to believe…

Tagged with: #alignment#ai

optimism

As Josh Marshall said , at the beginning of the Trump presidency: "Optimism is not primarily a prediction but an ethic, a philosophy, a way…

Tagged with: #how-to-think

option

Tagged with: #finance

optional stopping

If is a [ martingale ] and is a [ stopping time ], then any of the following conditions implies that : The stopping time is bounded…

Tagged with: #math

organic chemistry

Tagged with:

origin of suffering

Ken McLeod claims that 'emotional reactivity' is the origin of suffering. Pain consists both in what happens and in our reaction to it. But…

Tagged with: #buddhism

otter.ai

Tagged with:

overparameterize

Tagged with:

ownership

Tagged with: No Tags

oxidation

mnemonic: OIL RIG = 'oxidation is losing (electrons), reduction is gaining (electrons)' in contrast to [ acid-base chemistry ], which is…

Tagged with: #chemistry

oxidative phosphorylation

This is how [ mitochondria ] produce most of their [ ATP ]. Mitochondria have an outer membrane and an inner membrane, so there are two…

Tagged with: #biology

p-zombies

Tagged with: No Tags

pale blue dot

Look again at that dot. That's here. That's home. That's us. On it everyone you love, everyone you know, everyone you ever heard of, every…

Tagged with:

paperclip maximizer

Tagged with: No Tags

papers to read

Tagged with: #papers

particle MCMC

Basic notes from https://www.stats.ox.ac.uk/~doucet/andrieu_doucet_holenstein_PMCMC.pdf Setup: we have parameters and time series model…

Tagged with:

party ideas

Chocolate tasting: buy a bunch of high-end, single-origin chocolate bars. Parcel them out blind. Give people a pad to take notes on what…

Tagged with: #ideas

penalties are constraints

We often see optimization problems with objectives of the form where is the main function of interest (e.g., training loss in machine…

Tagged with:

people like hearing their name

“Remember that a person’s name is to that person the sweetest and most important sound in any language.” Dale Carnegie (How to Win Friends…

Tagged with: #relationships

people want to see you thrive

When you're thinking about doing something that feels right to you, it's easy to get caught up in worrying about what other people will…

Tagged with: #life-advice

perceiver

reading the perceiver papers from Deepmind: Perceiver: Jaegle et al 2021 https://arxiv.org/abs/2103.03206 Perceiver-IO: Jaegle et al 202…

Tagged with: #ai#machine-learning

persistent hallucination

In the [ 5-MeO-DMT ] trip where I experienced [ ego death ], I saw a [ magical display ] of beautiful colors and flowing motion and…

Tagged with: #drugs

personal AI Effect

The AI Effect refers to the widely-recognized phenomenon that 'once we know how to do it, it's not AI'. For example, playing chess well…

Tagged with: #ai

personal philosophy

I always found it weird that philosophy spends so much time talking about specific historical philosophers. Who cares what Aristotle, or…

Tagged with: #teaching#philosophy

personal value-over-replacement

When considering one's impact on the world, it's important (? or at least tempting) to think about about your value-over-replacement. If you…

Tagged with:

phase change hypothesis

(see also: [ large models ]) There's a viewpoint that neural nets just memorize the training data, so the more training data you have, the…

Tagged with: #machine-learning#modeling

phase space

Tagged with: No Tags

phase transition

Tagged with: No Tags

phenethylamine

Tagged with: #chemistry#drugs

phenibut

Developed and widely used in Russia, phenibut is an analogue of [ GABA ] with a phenyl ring substituted at the carbon, giving it the name…

Tagged with:

phosphate

Why Nature Chose Phosphates (science.org)

Tagged with: #biology#chemistry

pointing out

The paradoxical thing about pointing-out style meditation teaching is that you can't really explain the instructions when they're unclear…

Tagged with: #meditation

polar

Tagged with: No Tags

policy

Tagged with:

policy gradient

(see also my [ deep RL notes ] from John Schulman's class several years ago, which cover much of the same material) We can approach…

Tagged with: #reinforcement-learning

polyak averaging

Tagged with: No Tags

positional embedding

There are a few ways to do this. Google's PaLM uses rotary embeddings so it seems like that's probably close to the state of the art? But…

Tagged with: #transformers#ai

positive sum

Tagged with: No Tags

potential outcomes

Different experimental conditions may give rise to different outcomes . For example, let the variable indicate whether a person is…

Tagged with: #causality

prayer is therapy

Prayer is a form of [ therapy ]. It's about clarifying your values: figuring out what you really want so that you can ask God for it. and…

Tagged with:

predictable process

A [ stochastic process ] is predictable if its value at time is fully determined by information available at time . Any fully…

Tagged with: #math

prediction as a model-building exercise

A really valuable exercise that I should consider building into my routine is to regularly try to make and write down explicit predictions…

Tagged with: #how-to-think

predictive agent

Consider an agent that is purely concerned with [ predictive processing ]: finding the optimal [ compression ], or equivalently the optimal…

Tagged with: #ai

predictive processing

The theory of predictive processing seems to be attracting a lot of interest in neuroscience and [ meditation ] circles. I want to try to…

Tagged with: No Tags

preference cascade

https://www.quora.com/What-is-a-preference-cascade A lot of how people act is driven by how they think they're 'supposed' to act. There's…

Tagged with:

previously read

AI / RL Distributional RL book: https://www.distributional-rl.org/ Alignment Sequences: Value learning: https://www.alignmentforum.org/s…

Tagged with: #papers#ideas

principal-agent problem

Tagged with:

priors are conceptual attention

A Bayesian view of (one aspect of) [ attention ] inspired by a conversation with Shamil Chandaria on [ predictive processing ]. (but this…

Tagged with: #ai

privacy

It seems like there is, or can be, a virtuous relationship between privacy and generalization. You don't want to memorize too many…

Tagged with: #machine-learning

privilege

I have some discomfort with the political concept of 'privilege', e.g.: Being white is a privilege. Being male is a privilege. Being…

Tagged with: #how-to-think#lgbt

pro-social identity

(I got this concept from SuccessfulFriend.) As people grow up and form their identities, they need models, and not just models; they need…

Tagged with: #growing-up#psychology

probabilistic program induction

Can we think about [ generative flow network ]s as a potentially tractable formulation of probabilistic program induction?! executing a line…

Tagged with: #machine-learning#ai

probabilistic programming

Tagged with: #machine-learning#modeling

probabilistic programming is not AI research

Many [ probabilistic programming ] researchers frame their work as part of the broader problem of [ artificial intelligence ]. Artificial…

Tagged with: #machine-learning#bayes

probabilistic transformers

A short note on interpreting a transformer layer as performing maximum-likelihood inference in a Gaussian mixture model: https://arxiv.org…

Tagged with: #papers#machine-learning

probabilities hide detail

Matt Levine explains how a financier might react to losing a billion dollars: Sure sure the risks didn’t work out but you probably have a…

Tagged with: #modeling

probability space

A probability space consists of: A set of outcomes aka possible worlds; these represent all the ways the world might be. This is the…

Tagged with: #math

process is frequentist

(aka, why frequentists will always make more money) In the "real" (corporate/governmental) world, most high-level decision making is…

Tagged with: #bayes

product of experts

Introduced by Geoff Hinton (1999): Products of Experts . Each expert produces a probability distribution. These are combined by…

Tagged with: #math#machine-learning#modeling

production vs consumption

Tagged with:

projection is unavoidable

The idea of 'projection' in psychology means to assume that someone else has the same flaws, or foibles, or motivations as you do. It struck…

Tagged with: #how-to-think#psychology#relationships

proof of stake

So the mechanism is if you have tokens you can choose to stake them. And in order to run anetwork node you must stake some number of tokens…

Tagged with: #finance

proof of the policy gradient theorem

The policy gradient theorem says that For simplicity we'll assume a fixed initial state and fixed-length finite trajectories, but the…

Tagged with: #reinforcement-learning

provably safe system

References: Tegmark and Omohundro, Provably safe systems: the only path to controllable AGI (2023). https://arxiv.org/abs/2309.01933 they…

Tagged with: #papers#computer-science#crypto

proximal

Proximal methods in optimization The proximal operator of a [ convex ] function is defined as the minimizer of plus a distance penalty…

Tagged with: #math

proximal policy optimization

references: paper: https://arxiv.org/abs/1707.06347 great blog post on implementation details: https://iclr-blog-track.github.io/2022/0…

Tagged with: #reinforcement-learning

psilocybin

Tagged with: No Tags

psychedelic

[ 5-MeO-DMT ] [ mescaline ] [ psilocybin ]

Tagged with: #drugs

purchases I recommend

Toilet: A bidet. Cold water, warm-water (if a hose from your toilet can reach your sink's plumbing), or internally heated. It saves toilet…

Tagged with: #life-advice

purpose

Tagged with:

pushforward natural gradient

It's tempting to use [ natural gradient ] ascent to optimize a variational distribution. We could also consider using it to optimize the…

Tagged with: #machine-learning

put-call parity

A portfolio containing a long (European) call and short (European) put [ option ] with the same strike price and expiry date is equivalent…

Tagged with: #finance

pyrrole ring

A five-sided carbon ring with one nitrogen: C4H4NH.

Tagged with: #chemistry

python project setup

General procedure for setting up a new Python project. Create a new git repo and clone into a directory my_new_project Add files…

Tagged with:

qualia

Tagged with: No Tags

quiet desperation

Tagged with: No Tags

quotes

"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." - John von Neumann Young man, in…

Tagged with:

random variable

Formally, a random variable is a (measurable) function defined on outcomes from a [ probability space ] . That is, in any possible…

Tagged with: #math

randomized controlled trial

a powerful tool for establishing [ causality ]

Tagged with: #causality

rate equation

The rate equation or master equation for a continuous-time Markov [ stochastic process ] describes how the probability density of the…

Tagged with: #math

rationality is moral

From a [ utilitarian ] perspective, all of morality follows from improving global utility, and it follows that it'd be better to do this…

Tagged with: #morality

reading inbox

In no particular order. Items may move to [ previously read ] if I read them or former reading inbox if I decide I'm not currently…

Tagged with: #papers#ideas

reading is processing

One model you could have of reading a book is that the book contains information, and once you've read it, you now possess that information…

Tagged with: #how-to-think

reality tunnel

Tagged with: #drugs#modeling

reasons to write

Why do I want to write more? Because: writing forces thoughts to crystallize. It forces me to draw conclusions about what I believe and who…

Tagged with: #personal#how-to-think

recipe for ruin

[ Nielsen's notes on ASI xrisk ] introduced the thought experiment: If you ask an all-knowing oracle a question like "Can you give me a…

Tagged with:

recipes

See also [ family recipes ]. Roast chicken and vegetables: preheat oven to ~400. cover a spatchcocked chicken with salted garlic butter at…

Tagged with: #ideas#personal

recruiting

The best way to recruit people is to convince them that they will learn and grow by working with your team. Pitches that have 'worked' for…

Tagged with:

regularization

Tagged with: No Tags

reinforcement learning

Note : see [ reinforcement learning notation ] for a guide to the notation I'm attempting to use through my RL notes. Three paradigmatic…

Tagged with: #ai#machine-learning#reinforcement-learning

reinforcement learning advice

https://andyljones.com/posts/rl-debugging.html https://www.reddit.com/r/reinforcementlearning/comments/9sh77q/what_are_your_best_tips_for…

Tagged with:

reinforcement learning from human feedback

see: [ steering language models ], [ direct preference optimization ] We are given a bunch of pairwise preference evaluations, of the form…

Tagged with: No Tags

reinforcement learning notation

There tends to be a lot going on in RL algorithms, with a whole mess of different quantities defined across timesteps. It's useful to try to…

Tagged with: #reinforcement-learning

relationship

[ relationship advice ]

Tagged with: #relationships

relationship advice

see also (maybe combine with?) [ relationship ] Accept [ bids ] as much as possible. Praise your partner in public (and in private). Stay in…

Tagged with: #relationships#life-advice

religion

Tagged with:

relu inequality

Suppose we want a [ transformer ] to evaluate the inequality returning if and otherwise. For integer , this can be done with a…

Tagged with: #machine-learning#transformers

relu selection

The selection operation y = where(c, a, b) returns How can a [ transformer ] layer implement this operation? One approach is to is to use…

Tagged with: #machine-learning

remember arguments

When I was younger---in college or in grad school---I was sometimes conflicted about whether I should prioritize trying to get to correct…

Tagged with: #how-to-think

reparameterization trick

Tagged with: No Tags

replica trick

If a model with data has normalizing constant , then the replica trick says that This allows us to analyze the average log-normalizer…

Tagged with: #machine-learning

representation

In modern ML, representation learning is the art of trying to find useful abstractions, embodied as encoding networks. We can learn…

Tagged with: #machine-learning

research community

To be a successful researcher it's incredibly important to find and join your [ research community ]. Go to conferences (especially to small…

Tagged with:

research idea

This note lists some ideas and directions for research I'm interested in or excited about. Some are more fleshed out than others, some more…

Tagged with: #personal#ideas

research identity

Tagged with: No Tags

research worth doing

(see also: [ impact ]) I've been feeling depressed partly because the actual PhD research I did was (in my view) pointless, and more broadly…

Tagged with: #research#personal

researchers don't always know best

People who do research have a very ground-level, zoomed-in view of their field. They know where the current obstacles are, how incredibly…

Tagged with: #ai#research

reservoir sampling

Reservoir samplers solve the following task: sample items without replacement from a stream of unknown length . Because the length is…

Tagged with: #bayes

retreats

Teachers or centers I'd be interested to do a retreat with/at: Tucker Peck Michael Taft Tina Rasmussen (Cloud Mountain 13-day retreats…

Tagged with:

reversal curse

References: The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" https://arxiv.org/abs/2309.12288 Studying Large Language…

Tagged with:

reverse diffusion

References: Ludwig Winkler's post on Reverse time stochastic differential equations . Suppose we have a [ stochastic differential equation…

Tagged with: #math

reward

stray thoughts about reward functions (probably related to the [ agent ] abstraction and the [ intentional stance ]) one can make a…

Tagged with: #ai#reinforcment-learning#alignment

reward funnel

When thinking about the [ reward ] function for a real-world AI system, there is always some causal process that determines reward. For…

Tagged with: #alignment#reinforcement-learning

reward is enough

Silver, Singh, Precup, and Sutton argue that Reward is enough : maximizing a reward signal implies, on its own, a very broad range of…

Tagged with: #ai#reinforcement-learning

reward shaping

Suppose we have a [ Markov decision process ] in which we get reward only at the very end of a long trajectory. Until that point, we have no…

Tagged with: #reinforcement-learning

reward uncertainty

See also: [ cooperative inverse reinforcement learning ], [ love is value alignment ]

Tagged with: #reinforcement-learning#alignment

rl diagnostics

Things that might be useful to log in a [ reinforcement learning ] algorithm: Return of each trajectory. (summarize as mean/std/min/max…

Tagged with: #reinforcement-learning

rl goals

Implement MuZero or something similar. What are the 'state of the art' RL algorithms? What is known and not known about [ value alignment ]?

Tagged with:

rl with proxy objectives

Suppose we want to maximize reward, but we only get a couple bits of reward data every few hundreds/thousands of actions, whereas we get…

Tagged with: #reinforcement-learning

rocket equation

Deriving here just for my own edification. At each timestep a rocket ejects mass at velocity relative to its current reference frame. At…

Tagged with: #physics

romance is twenty times harder for gay people

About 5% of people are gay, so in any given community it's about twenty times harder for a gay person to find a partner than for a straight…

Tagged with: #relationships#lgbt