Modified: December 05, 2025
Emmett Shear interview with Parker Conley
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Emmett Shear – SF, Power, AI Alignment, Meditation, Softmax Sept 2025 https://www.youtube.com/watch?v=3SSelpjkNpA
(YouTube automated transcript cleaned up by Claude 4.5 Sonnet)
Parker: Emmett Shear was the co-founder and CEO of Twitch and is now a co-founder and CEO of Softmax, an AI alignment company, which we'll be talking a lot about today and throughout this conversation. More than anything, I'm interested in Emmett's ideas on philosophy and him as an intellectual. Emmett, welcome.
Emmett: Thank you.
Parker: A few years back, on a Salon show, you hinted at or ideated there being a science of social media at some point, and we're in the very, very early stages, if even started with that. I'd be curious if you could take that lens and think about the differences between Twitch and Twitter as social media platforms.
Emmett: Yeah, I mean, I think that we increasingly do have a model of social media as a science. Because I think that it's really clear to me that social media systems are learning systems. They're like—Twitter and Twitch and Reddit and anything else really look like a neural network. Like you have nodes, you have information flow, you have activation connectivity. The system changes and moves over time. And so I think we are on the verge of building actually strong scientific models based on analogy to how learning systems have these trajectories in general. And it's not a neural net, it's not a brain, but it is like that.
I think if you think about it from a learning systems perspective, the difference between Twitter and Twitch is pretty clear, which is that Twitter is a homogeneous network. Like every account produces and consumes the same kind of thing. So it looks kind of like a traditional neural net in a way. Like you have these nodes which are accounts. The nodes produce signal which used to go out according to the following graph, now goes out according to a learned soft attention over who should be paying attention to this. And those things then propagate either by being ripped on or quote tweeted or responded to, and the signals that resonate go—it's basically a pretty decent description of a normal neural net.
Twitch, by contrast, looks more like Hebbian learning, right? So it's like "fire together, wire together" is the Hebbian learning thing where if you're online at the same time as a streamer, as a viewer, you're more likely to form a bond with them. And then if you're online, then that makes you more likely to adjust your schedule to be online with them in the future. And the streamer is pulled not by an individual person but by the aggregate of the people. When their audience has time, they tend to want to stay in sync with that. And so the whole system is this "fire together, wire together" where the information moves in accordance with a Hebbian learning process instead.
And there's always a way to sort of see, you know, to go back and forth. But I think you see one of these like an asymmetric Hebbian process and one of them is like a sort of a standard spiking activation.
And I think the implication of that is that Twitch has a much harsher penalty for connectivity. So where Twitter has a low graph complexity penalty—like the fact that lots of people have seen this, there's lots of commentary going on, basically unlimitedly drives more reward—on Twitch, as you have more engagement, the incremental engagement starts to degrade in value because you can't talk to everyone, you can't engage everyone. The chat's going too fast, you lose some of the qualities of the smaller rooms that people like.
And so there's this—you'd call it an auxiliary loss or something, right? There's this counter pressure, an inductive bias that Twitch has towards smaller channels which are phase clustered. So it's clustered by like what you co-activate with, not by what you decide to propagate. You as a viewer don't decide which parts of the stream to pass on. You just choose what to co-activate with. And that's what gets boosted—your showing up to watch boosts the channel. The channel showing up boosts your engagement.
And there's kind of this two-sided—it is the fact of the engagement, not the content being passed. It's like on Twitter, you're choosing "am I—which neurotransmitter am I emitting?" Whereas on Twitch, there's no—it's just like you're either active or you're not at the same time.
Parker: So taking that perspective of neural networks or whatever, if like—maybe narrowing down what would be the axioms of, like, the various dimensions of a social media platform—if there was like more of a science, how would you compare and contrast them?
Emmett: Yeah, I think what we're going to learn is that it's just like comparing any learning system architecture, right? The important questions are things like: How big is the network? How many bits of information at the boundary and what's the dimensionality of the information transiting the boundary? What's the reward function? What determines whether signals propagate or not? Which is like, on Twitter it's likes and replies are kind of like a reward.
And like, you know, the structure of where are the information boundaries, what information, what's the dimensionality of the stuff inside of it and what crosses the boundaries, and what's the graph topology between them. Basically just like—that is—if it is the same in that way, it will be the same. That's what matters about a social media network at some level.
Parker: Okay, I follow. In terms of you being able to have this perspective, having been at Twitch for so long and sort of designed these platforms, I'm curious if you could tie this now to the average user on social media platforms. How does this perspective, and maybe leaning into learning systems a little bit, underline maybe a common misconception someone might have about how these platforms work?
Emmett: Yeah, I think—okay. So I think people have this tendency to be mad at the users or the advertisers or the people running the service. And the things they're pointing at—people's bad behavior or the service running good moderation or bad moderation or whatever—that's downstream. The thing you're mad about on Twitter is actually just the core thing that they show you the stuff that you're more likely to engage with. That's—as long as that's happening, everything else—the thing you'll see this. As long that's happening plus some of the other things about the architecture, you'll always get the same result.
And complaining about the specific—everything other than that is like—you just—you have to—if you want to change it, you either have to—Twitter has to stop being Twitter. So you have to stop having feeds and stories and stuff. Or if you want to keep the core structure of what the service looks like, you have to have some counter pressure that is not engagement that prevents it from optimizing for that.
And if you think about what Twitter needs to survive is your time on site, your engagement, your consistent returning time on site and engagement—is what, like, Twitter's lifeblood. So Twitter, given that Twitter continues, it must embody an expectation that you're going to use the service a lot, right? Whatever it does has to be compatible with that future.
Hold on. And what I would say is that that high level expectation gets turned into a bunch of sub-expectations, which is things like: people who see a lot of posts that they engage with are more likely to be retained and spend more time on site.
Parker: Sure.
Emmett: Which is true. And the system—whether it's the people running it or the system itself—gains high confidence in this prediction. And then that becomes—then the system begins to optimize for that. And then whatever causes that—if there's something that causes engagement, it then forms a belief that that's what it should go for. And that's kind of what these recommendation systems do—they're building a set of priors about what healthy, good behavior looks like. Healthy from the perspective of you spend a lot of time on Twitter. We could debate, but from Twitter's point of view, that is what healthy looks like.
But Twitter doesn't just care about healthy behavior now. Twitter cares about the long term, right? Twitter cares about you continuing to be a healthy user 10 years from now, 20 years from now, in theory. Even if they don't always act like it, they should—I think Twitter does.
And for that, if you have a model that is very, very precise right now, but it's generated by a really complex model that describes all the ways—because it's precise because it describes all the world ways the world is right now, in detail—that's a very powerful but fragile model. Because when the world changes, all those complex assumptions—all those little detailed assumptions—become wrong. And the assumption at the top being "people spend a lot of time on Twitter and we need more of that"—but you build this complicated model of how to get people to do that. And this recommendation engine for every single person tells them exactly what makes you spend more time on site. And if you imagine the size of this model, it's very big.
Parker: Sure.
Emmett: There's a lot—it has a lot of detail in it. The more—in a model complexity sense—that means it is high model complexity. There's a null hypothesis model, a model of maximum entropy which says we don't know why people spend time on site, it could be anything. And the more you add—the more bits of information you add above "no, we do know why, it's because of this, because of that"—the more complex your model is. And the more complex your model is, the less robust it is to change.
And so when you look at how they optimize recommendation engines today, there's no countervailing term for complexity. They don't penalize the system for treating you differently from other people. They don't penalize the system for having a complicated model of you rather than a simple model of you. They don't penalize the system for having high degrees of variance in which content gets viewed, which is another form of complexity in the system. The null hypothesis is everything should be viewed the same number of times. The more inequality you have—the higher your Gini coefficient on how popular content is—the more complex your model is.
And so there's no penalty against—basically the complexity penalty wants to push you back towards the uniform prior on every dimension. And no one does—whether in the economy or as we manage the economy or as we manage individual companies, we don't push back on complexity. We treat accuracy as an unalloyed good.
And to some degree, this is a hard trap to escape because if you optimize for accuracy and complexity and I optimize for accuracy, I'm going to win in the short run, and the network effects mean I win in the long run. That is sort of true. But at some point you get big enough and powerful enough as a company that you can afford some mismanagement. Like Twitter has gone through some mismanagement, it's gone through periods where it hasn't, and yet they're still here. So clearly they don't have to squeeze every last drop of accuracy out at every moment. Which means a wise Twitter would spend that slack on cooling—on reducing its complexity and allowing some drop in accuracy.
And for you as a user, what that means is—if you want to reduce the complexity—you have control over this in your individual life. What does reducing complexity look like? It means not moving your engagement prior away from "I want to consume the most popular content" and towards "I want to consume all content approximately the same." Those are two extremes, right? One extreme is I only consume the very most popular content in the world. And the other extreme is I consume literally at random. Those are both bad. You don't want either of those. But move away from "I try to consume the content that is the best"—the thing that just purely based on what is—and add some regularization pressure towards "I just consume stuff that is less popular." Whatever signal you're using, whether it's popularity or relevance, spread out a little bit. Spread a little more.
Parker: Okay. And then zooming in on that and making it a little more pithy, like people using Twitter now—are there a few heuristics? I remember you mentioned briefly on Twitter, like don't follow really big accounts.
Emmett: Okay. Like block the big accounts. Like anyone who has more than a million followers—actually you don't want to have a hard cutoff. It's like stochastic. Go through and if someone has more than a million followers, take the log of the number of followers they have and then flip a coin that many times, and if it comes up heads any of the times, unfollow them.
Parker: Okay.
Emmett: Block them and get—the high resonance information that everyone is seeing is basically high complexity. It's adding—it's synchronizing you with the rest of the system. And that's good because it allows you to be more accurate. You know what everyone else is thinking accurately. But it's bad because it's thrashy. It makes you believe this popular thing today, you believe this thing tomorrow, you're getting whipped all over the place.
Whereas if you pay attention—and necessarily when you go smaller, it's more local. Like it has to be—how do you find the more small things? But you're not going to pick random small things. You're going to pick the best small stuff you can, which is going to be closer to you, like things that are more relevant to your life. The most interesting thing about the biggest San Francisco influencer is smaller than the biggest California influencer is smaller than the biggest United States influencer, smaller than the biggest global influencer. So it pulls you local in geography, in interest space, in whatever. It pulls you local to you.
And now you're robust because when things change, the things that change the least are the things that are closest to you. And you're also just more spread out. You're consuming a wider variety of information, which means your model is more likely to contain something that has good information about this new thing that just happened. Like the more broad you are, you're less deep on this one thing, but you're more likely to have useful information that now turns out to be useful because, oh, the world changed and this thing I thought was not important is now important, and now but I already know about it because I was more random in what I was consuming.
And this is just what robustness is in general. It's not over-rotating on being right all the time and being more open and more broad and more—less tuned to your particular interest and more open to the people around you.
Parker: And then three to five more pithy heuristics? Block the big accounts—anything else that you'd recommend just for Twitter?
Emmett: It's block the big accounts combined with follow small accounts. Think locally. What you want is you want to follow people and things that you interact with a lot in lots of contexts. You want to be reconnected to the same things in lots of contexts. So like meet—mute—like become—meet the people you know online offline, or follow the people you know offline online. Like don't have a—one of the number one ways you end up with a low quality model, where you have both bad accuracy and bad complexity, is it's unintegrated. You have these facts over there and these facts over here and these facts, and none of them—you can't simplify.
So if you're following news about these celebrities, but your job you do this, and in your home life you have to do that, none of this reinforces itself. None of this—you can't use your learning here with your learning there. It's like a different world almost. It's like you're living in three different worlds. It's expensive. It's harder to understand what's going on. So you want to bring everything—bringing things more local is good twice. It reduces the complexity, but also it gives you ways to find accuracy wins with also lower complexity by merging things together.
And in general, cycle how much new information you bring in. When you want to make a strong piece of steel, you go through a process called annealing where you heat the steel up a lot so that the iron atoms can move around freely. And it allows it to sort of jiggle around and move to a better configuration. But then you cool it so that the structure it found—this loaded structure—gets consolidated and solidified and becomes consistent into a thing. And then you heat it up again. So from this new solidified structure you heat up again.
What this looks like in your life if you want to do annealing on your own experience is: use Twitter a lot for like a month and then don't use it for a month. And then use—like this idea that you should use it a consistent prudent amount is actually wrong. Use it a lot, explore wildly, really heat the system up. Allow yourself to learn new ideas, try, listen to new people. And then stop the input for a while and try to sort through it. Try to figure out: what do I actually believe? What am I keeping? What's important to me? And then heat the system up again from this new place where you've considered—heat it up again. And so you don't want to—if you run too hot too long, you'll melt. And if you don't get enough new input for too long, you'll freeze.
Parker: Sure.
Emmett: And this is true for physical systems and it's true—not literally in the same way—but it's true for people too.
Parker: To take this more physical then—still thinking about Twitter, but this is my first time in San Francisco. I've been consuming San Francisco sort of content for the past few years, exposed to the culture. I'm curious how you would think about SF culture memetically and what are the different dynamics? What are the—how does Twitter fit into this? How do founders fit into this? How do VCs fit into this? How do small subcultures like the rationality community fit into this?
Emmett: So yeah, so San Francisco is a gold rush city. San Francisco was founded by people in a gold rush country. So the United States was founded by people who were relatively risk-seeking and optimistic. They have this joint of "I'm willing to go take a bunch of risk because I don't like how my life is to move to this new place for a better life for me and my children." But I'm optimistic enough to think that—there's two responses when things aren't going well. One is to hunker down and try to protect yourself, and the other is to leave. And the people who came to America—leaving is the optimistic response. "I can do better."
So America has this bias towards both risk-seeking and optimism. And San Francisco is a gold rush town—it's literally from the 49ers—in America. So it reconcentrated people who, when they heard they might be able to become rich by going west to San Francisco and digging for gold, they decided that was a good idea and they went to go do that. And some of them succeeded, some of them got rich. And the water rights and all the forestry stuff—that's also gold rush. And so it's in its DNA. San Francisco is a city that believes the future is bright if you are willing to take a risk and venture into the unknown, to the frontier.
The United States is that kind of a country, and California in general and San Francisco in specific, the Bay Area in specific, are the most concentrated America of America in that way. America has other attributes also, but in that aspect, in that way, this is like maximum America.
And we've learned—we've been positively reinforced for this point of view repeatedly since the 1850s. It became the center of a bunch of further mineral rights rushes and timber rushes. And then it was the center of a lot of shipbuilding and industry. And then we had the '60s and it's the same idea—where's the new frontier? I don't know, let's go take a bunch of LSD and find the new psychic frontier. "What if I took a lot of risk and just assumed everything's going to turn out well?" That's kind of the "let me just dose myself with very high doses of these very psychoactive drugs and heat the system up a lot and, you know, it'll probably be cool, it's going to be good." It's not an accident that this happens in San Francisco.
And then we get the '60s, you have Silicon Valley and the semiconductor industry, we get reinforced again. And software and the internet—we just keep getting told by the—it keeps working. It turns out—not for everybody—but for the people who succeed, it works really well for. And it's a culture. So it's a culture that is defined by this deep belief that flourishing is dependent on risk-taking, optimism, and that it's not zero sum. The way you get rich is not by winning a game that exists. It's by going and striking out, striking big in a new game, whatever the new game is. Go win there.
And this is why, by the way, crypto was never a great fit for San Francisco, because the way you get rich in crypto is by getting other people's money given to you. Crypto doesn't generate money, it transfers money. And that's a much better fit for the East Coast, like New York. It's fine. It's a—it literally is a finance product. Specifically, it's a finance—for the most part, Bitcoin has a lot of New York stuff, but specifically crypto is a finance product that's basically a Ponzi scheme. Most of it's like a Ponzi scheme. Not Bitcoin, but most of crypto, which is why it wound up mostly in Florida—the birthplace of Ponzi, Mr. Ponzi, who invented the Ponzi scheme, was also Florida. The cultures don't change. People just get reinforced for something and they get good at doing that thing.
So what that means is that people actually are very—they're very free with sharing ideas for the most part and very free with helping each other and paying it forward, because the pattern of success is—it doesn't matter if you exchange gold digging tips with the other gold digging person. What matters is—is your mine good? The fact that he's digging better in his mine is not going to change whether you get rich. So if you exchange it, it's good for both of you. So generally speaking, there's this attitude of "yeah, sure, whatever, who cares?" Like I get a little jealous if you're literally digging in the same place, but otherwise, who cares? And in fact, I'm happy to exchange information.
And more than that, I'm going to get rich in one big swoop where I'm going to have my big gold strike, I'm going to have my big company, I'm gonna have a big moment of success for striking it rich on this new frontier. And from there, I can be like an aristocrat. I don't—it's not a—in finance, you're on this track where you're growing. It's about compounding every year. Gold rushes are not like that. It's like nothing, nothing, nothing—I'm rich. Very monotonic, right? It's very sigmoid. It's very like a step function. It's very step-functiony. Not smooth. No accident computers were very step-functiony—digital people. Very digital. It's a very digital place.
San Francisco is full of people who want to think about things in terms of you are rich or you're not rich, because there's kind of this thing like some people got rich and some people are just like not the ones who got rich, and there's—the middle is weirdly kind of empty. Now the not rich, if you work in tech, compared to most places has gotten quite rich, but there's still this gap between "I was successful early or not."
And as the big companies have gotten bigger, this is less true. Some of the culture has gotten changed by the fact that you have these big successful companies like Google and Facebook and Apple. But that's the heart of San Francisco culture—everyone's excited. When you tell them about your crazy new idea, they're excited for you because that's what success looks like. And they're always excited—want to hear about the new frontier. They're interested in novelty. It's incredibly novelty-seeking because they need to know where should I go dig my gold mine next. And it becomes this burned-in habit even if you aren't looking for a gold mine. You just assume it's important to know where the frontier is. Where might be the—it's just interesting.
Parker: Yeah. Okay. I don't want to go into AI too much, but me trying to sense-make it, one thing that comes to mind is the talk of existential risk from AI and how that works into the picture. I'd like to tie this in with the San Francisco culture and AI. Some arguments against AI existential risk go about saying, like, sort of psychologizing, saying people have a tendency, civilizations have a tendency to think existential risk will happen. Do you have any thoughts about this take in terms of how do we think about our own psychology when we're thinking about these big problems like existential risk?
Emmett: Yeah. So San Francisco has this other undercurrent, this other part of the culture. I was talking about the tech culture—its nature is a gold rush town. But San Francisco is also—it's not Christian exactly, but it's like Unitarian Universalist. There's this—I don't know when this happened in the city, I think it's been true for a very long time—that America was a Christian country, and the brand of Christianity that San Francisco picked up is the one that's compatible with high variance. We like variance. We want lots of variance. And the brand of Christianity that's compatible with this is Unitarian Universalism, which says however you—it's all acceptable, as long as you're a good person. There's no one way to do it. It's very embracing of variance in how you express and see your faith, to the point where you could almost say there's no rules at all. But there is a rule. There's this idea that you're supposed to be a good person. But in a way where it doesn't really give you any rules to follow to do that. You're supposed to make up your own rules.
And if you look at San Francisco and sort of how it relates to morality, it's—I would say it's a very anxious city. It's a city that really believes it's important to do good, but isn't quite sure what that means. It's trying, but it's a little bit like—you're always—when you lack—if the point of having rules that's nice is that they tell you when you're doing a good job. If you have a clear set of commandments you're supposed to follow, if you follow the commandments and you've internalized that is what good is—not just "I'm just buying it off"—"I think that's what good is"—now you have this external sensor to tell you if you're doing a good job or not. You know what it looks like.
And San Francisco kind of wants you to follow your inner light, which can create a lot of—it's very beautiful and when it works well, it's really good. But it's also really hard and most people are very—confidently an anxious thing to do because you're never really sure, "Am I following my inner light or not?"
And so when you apply this to AI, we're out there on the frontier trying to figure it out, but we're really worried that maybe we're not doing a good job. Maybe this is—this is the whole—we're—it's a city—we have—power is very much in our shadow. It's a city that craves power. San Francisco is a city that's entirely about power. People come here—yeah, to get rich—but if you really wanted to get rich, you'd stay in New York. They come here because they want to get rich in a way where you have a lot of freedom, you have a lot of autonomy, no one—you don't have a boss to tell you what to do. And increasingly, where the gold rush wasn't quite like this, but everything since that was where you get to control what the future is like, especially since the '60s. It's about "I want to put a dent in the universe. I want to help. I want to make a difference." What you're saying is "I would like the universe to be different, and please give me the power to make it. I would like the power—I would like power to change the universe. Me change the universe."
I think that's a beautiful thing. I think that the only way—all change is not improvement, but all improvement is change. And if you want to make the world a better place, at some point you have to wield some power. That's just part of how it works.
Parker: Sure.
Emmett: And I think it is one of the—it's a beautiful thing that people in San Francisco earnestly do want to make the world a better place and are willing to do a lot of work to get the power to make that happen. And I think that power is intrinsically fun to exercise and that people like it, and that's also part of why we want this thing. I like being able to—it feels good for you to do the thing, not just for someone to have done it. And when you don't acknowledge that and you say "oh, power is really dangerous, it's corrupting, I don't want it," but then you keep acting to get power, what you've done is you haven't gotten rid of the problem of the power—you've shoved the problem in your shadow where you can't see it.
And when it comes out, it comes out as this anxiety. It comes out as "I don't know what good looks like. I'm supposed to figure it out for myself because I'm supposed to, and I care a lot about it because I care a lot about doing this thing, doing good." And now I write things into my set of rules about what makes good action for an AI that includes things like "helpful," which I think is a great thing. I think a lot of my friends are helpful. And "honest"—I strive to be honest and I think a lot of people—I think it's a thing I value in people I interact with. And "harmless."
Harmless means without power, effectively. Harmless is what you call someone who is weak. It is not a compliment. If you describe your friend as harmless, you're—"oh yeah, go on a date with that person, they're harmless. Hire them, they're harmless." No, that is what you tell—that's a way of telling someone this person is not competent. They're weak. They lack power. And we praise that as a virtue along with honesty. Honesty is a real virtue. That's a fake virtue. That's the virtue of someone who doesn't believe—you would only write that virtue if at some level you felt there was something intrinsically wrong with power. Not just when it's used badly. Obviously bad use of power is bad. But there's something intrinsically wrong with power.
And there isn't. Power can be used for good, and power used for good is good. That is—nobility is the wise and just use of power for good. And there's also bad. And it is also true that power has a corrupting tendency, and if you're holding power you need to be very aware of that, and you need to be careful about how much power you hold and for how long and in what context. The fear about power is not wrong. But you can't be noble if you don't embrace power. You can't be a good leader—you can't be a good leader of the world, of your life, of your team, of your family—unless you embrace your own power in that role as leader, as someone who is changing the world, who is putting a dent in it.
And that is a problem at the heart of a lot of what goes on in AI. I believe—if you look at the AI culture, everyone in San Francisco feels it. Unless you're stupid, you know that AI is potentially dangerous. It's obvious that this is an incredibly powerful technology. Obviously there's potential danger. What you do about that—those are all—I think people disagree. Some people have noticed that when we try to do something without danger, it's often worse than the disease. The cure is worse than the disease. And they could be right. I'm not trying to prescribe what to do about—I'm just saying, if you're not an idiot, holy shit, this thing is powerful. Power can be used for good and for ill. Okay, well then it's dangerous. Cool.
And people's reaction to it is to either be very anxious about that and constantly be trying to seek safety through control—which is like, I think most of the alignment and safety initiatives are safety through "oh, it's dangerous, let me wrap it up, put a bunch of boundaries, put a bunch of chains on it so it's safe. Put it in a corner. Keep it in the corner." Could work. Sometimes that works.
There's also EAC [Effective Accelerationism], which is "oh, this is really dangerous. Cool. I'm going to repress that. I'm going to say no, it's not. I don't want to think about it. I don't want to see it. I don't want to see that this is dangerous. I'm just going to say it can't come out any way but good. It's always good." Okay, man. Sure. Whatever helps you sleep at night. But obviously it could be good and it could also be bad. That's another very common response in San Francisco, which is shove it in the corner.
And ultimately I think those correspond basically to—both—different kinds of reactivity, which is—well, I don't want to say that they can be reactive. You could for wise reasons decide this isn't the kind of thing that you should care—no, actually it is. Okay, actually no, it's going to be fine. Or no, we—this is the kind of thing you have to fix. Those—you could be right.
My argument would be this is so new and there's so much Knightian uncertainty, which is unknown unknowns—anyone who tells you they really know what's going on is wrong. The details are obviously unknown. That EAC fails the uncertainty test. You don't know that. You don't know it's going to be safe. It could be bad.
And the control reaction—the anxiety is maybe fair. The anxiety—that fear is telling you something. But reacting with "and therefore I'm going to try to make it safe, to make it provably safe, or to create a set of universal rules that will make sure everyone is safe all the time"—you can't possibly get that right. You don't know.
And so I think the only thing—the attack we're taking with Softmax—I'm talking my own book, but I did it because I think this is the only plausible way to do it—is start small, run, figure out how you could measure whether it's going well or not. Alignment in a toy version first, and then in a bigger thing that's more realistic, and a bigger thing that's more realistic until you get to things that are more human scale and eventually bigger than human scale.
And that's how you—you want to go build the Golden Gate Bridge? First try building some bridges across a creek. And then try building some bridges across a bigger river and a bigger thing and a bigger thing, and experiment with new things and have a bunch of bridges fall down at a scale where it's safe for them to fall down. And eventually you'll start to get a science of structural engineering. You'll start to get a science of structure, material science, and you'll learn why some things fall over and some things don't. And you'll at least have heuristics that scale to a certain degree. And you'll be able to quantify, as you do more of it, how much you can trust the heuristics and in what dimensions.
And then if you do that kind of thing, there's a chance when you build the biggest bridge of all time—you've never built a bridge at this scale before—that it stays up anyway. Not that the Golden Gate Bridge was that, but some bridge was at some point the biggest bridge of all time. And a lot of those times those bridges don't fall over because we have a valid science. But you don't get valid science by sitting around theorizing. You get valid science through a combination of theory and empirical work.
Parker: Trial and error, something like this.
Emmett: Well, no, but trial and error also doesn't get you it. It is—you have a theory and you do experiments to test your theory, and then those experiments teach you something and you make a new theory, and you do new experiments, a new theory, and you climb this ladder of theory and experiment until you have a theory that is predictive of things you've never done before.
Parker: Okay. Interesting.
Emmett: Right. And that's what a good theory—doesn't just predict what you've already done. When you use it to predict new things you've never seen, it still works. And if you have a theory where that happens a lot, now you can have some justified faith that this thing's going to work. But as far as I can tell, it's the only way to have justified faith.
So if you don't have—if you come to me and you're telling me you care about AI alignment and you cannot give me a provisional theory and a set of experiments you can do that will either validate or not validate that theory that will lead to a better theory and better experiments—if you can't give me that spiral, you're not serious. Your thing can't possibly lead to the kind of solution we care about. And I think unfortunately most of the things people look at don't have a falsifiable theory. They don't really have a theory that could be tested. They have a practice that could work. Which is—it's the difference between chemistry and baking.
Baking is just applied chemistry, right? In a certain domain. But as a—when you're baking, you will never learn why gluten does the things it does. Why do you—you'll learn you need acid in these contexts. Why do you need to add acid to your baking thing or it won't work? You have to at some point start to theorize about things that are much more general than the direct thing you're doing.
Parker: Yeah.
Emmett: Even though—even if all you ever want to do is bake, there's a limit you hit without a broader theory. And so you need a—you want to align learning systems that—not just these ones but their progenitors—and this really complicated thing. You need a general theory of learning systems.
Parker: Could you define learning systems?
Emmett: Yes. Anything in the world—so I know it's vague, but you can see stuff and it persists over time, like objects. When you try to predict the future behaviors of those things, you have to take into account what they're observing now. Their future actions are conditioned on the observations that they receive in a way that requires you to model them as modeling the world.
I know it's a little bit confusing, but I can't model a dog and understand what it will do next in a meaningful way without—maybe something else could, but I can't model a dog without paying attention to what the dog sees, what the dog currently believes, or what I think the dog sees, what I think the dog currently believes, and therefore what it is likely to do going forward.
Whereas I can model a thermostat without doing that. I can model an active system, but it's not a learning active system. It's just going to—I can model it as always reacting to observation in the same way. And to be a learning system, you have to react to observations in new ways conditioned on basically your entire trajectory of observations, not on your most recent one.
Parker: So the dog in this example is the learning system.
Emmett: The dog is the learning system. Dogs learn. Dogs react to commands differently in the future than they did at the time. Whereas a thermostat always reacts to change in temperature in the same way. It reacts to the observation but in the same way, and therefore it doesn't learn. And that's just what learning means—that you are accumulating structure of some kind, the object inside of you, and that structure has causal power over your future actions.
Observations turn into structure. Structure turns into actions. Therefore, you kind of—it's like you have all your past observations, a history of all your past observations with you at all times.
Parker: Okay, there's memory.
Emmett: Another way to put it is that learning systems act like they have memory. What's the result of learning? The result of learning is memory. Memory in a general sense—memory of a skill, memory of an event, memory of the world in some way. Learning systems have some kind of memory. And so—humans learn a lot. Humans are really deep learning systems. They have really deep memories.
A piece of steel is a learning system. It is a—when you anneal it over time, depending on the pattern of hot and cold and how fast you do it, you get different amounts of brittleness and flexibility in the steel. It's learning. It's learning a little narrow. I would classify it as a learning system. It's a very narrow kind of learning system because its memory is very shallow. It doesn't learn about the world in general, but it does learn about the history of temperatures it's been exposed to.
Parker: Sure.
Emmett: And so to the degree something is a learning system, which is always to a degree, you have to have a theory of how do I understand—how do I predict its future behavior? That's the general case of the alignment problem, right? I have this system that's going to act differently in the future than the past based on its experiences. What invariant statements can I make about its future behavior based on its past?
If you don't have a theory—if your theory doesn't let you answer that question in the general case, it's not a general enough theory to solve this kind of a problem. You just don't have that kind of a theory. Just like if you want to predict motion—you want to predict the motion of the planets not just now but into the future—you need a theory of how things move, not a theory of planets, not a theory of planetary motion. Because what if things change?
Parker: Yeah. Okay. To zoom in on a specific aspect of Softmax's research that's particularly resonant to me, I'm curious how contemplative practice relates to the work at Softmax.
Emmett: Yeah. So our team is very rich in people who have done some kind of meditation or experiential practice where the content of it—people have different backgrounds—but the content of it is always the same, which is you pay close attention to the content of your awareness. There's all these sensations that arise. Thoughts are a sensation. Reflecting on a thought is a sensation. That itch on your butt is a sensation. Everything's a sensation, because you're aware of it. It's a piece of content. And you pay close attention to that without trying to change it, without reaction, without trying to make it go away. You just—your intention is about being attentive to it.
There's a bunch of different ways you do that, but that's the substance of it. And when you do this, you basically give your brain contrastive learning data where you have a lot of data in your life where you have an understanding of the world conditioned on trying to fix or change things, where you notice something and then you want to move towards it or away from it or fix it or change it when you notice it, or where you repress it or you need to make it—you need to soothe yourself about it. You need to be okay. And you're just getting the contrastive training data that's sort of like, "yeah, but okay, I know that you need to do those things, but what if we just, for an experiment, for like 20 minutes a day, we just sat in a calm, quiet room where evidently you actually obviously don't actually have to act immediately? So what if we just didn't act for a while? Just, you know, just try that for a while every day."
And it turns out when you do this, you notice that you're the little kid with the controller that you think is playing the video game and it's not plugged into the TV. A bunch of things you—it's kind of plugged into the TV, but not really. A lot of the things you assumed were like the result of your action or the result of you self-soothing or the result of some internal or external emotion—no, that was going to happen anyway. Nothing you did made any difference. It's totally fine.
And the result of that is that you notice that basically your world model gets a lot sharper. You realize that some things I do really do matter. It is important that if I don't do that, this result really won't happen. If you're sitting on your ankle and it's starting to fall asleep and you just keep sitting on it, it's going to hurt more and more and more until it doesn't hurt, and then it's going to get really, really numb. That's not your imagination. That's not—you aren't theorizing incorrectly that those things are going to be conditioned on your actions.
But a lot of stuff isn't like that. A lot of stuff, if you don't do it, it's totally fine. And this realization goes really deep and you start to notice it not about direct sensation stuff like itches or whatever, but you start to notice it about ideas you have. Like "I need to make sure—I need to take care of making sure that person shows up on time or they won't show up on time." Have you tested that theory? Do you have any contrastive data about what happens if you don't make sure?
"I need to soothe this person's emotional hurt." "I need to drive myself with anxiety or I won't do the thing." You might—that might be true. Have you tested the theory? Some people do need to drive themselves that way, sometimes at least, without other motivation. Have you tested this theory?
And when you start to test the theory, what you realize is that sometimes you were right, sometimes you were wrong. And it just echoes out over your whole life. And it's just this massive increase in the precision of your beliefs, the accuracy and precision of your beliefs about when you need to act and when you don't.
And the reason why this experience is important is—well, A, you should do this. It makes your life better. If you haven't ever done it, you have a lot of low-hanging fruit in terms of what you think you need to do that actually just—your controller is not plugged in. Sorry, that's an illusion. Not all of it, but a lot of it.
But the real important meta thing about noticing this is that it's possible to be very competent and very effective while totally deluded. Your beliefs about things that are not directly tied to your actions and your future observations can support almost arbitrary levels of delusion. If it's not in the loop—the trajectory that you go on over and over again—the fact that you think you have to act to make it work this way can be so arbitrarily wrong because it's all compatible with a life where you are living and maybe even thriving.
You can support arbitrary levels of delusion with arbitrary levels of success. This should terrify you if you work on AI. It means no matter how good your AI gets at succeeding at whatever eval, it could be arbitrarily wrong about reality. And in fact, not only can it be—I think inevitably will be—unless you go through a systematic process of waking the AI up, effectively, of allowing it to do whatever the version of AI meditation is. I don't think it's like human meditation, but there's something that is equivalent to allowing the AI to learn to perceive in its own experience what it's like when it doesn't try to achieve its goals and to notice the degree to which its future observations are—there's this question: how much am I doing it and how much is it the world?
And this is both about what I perceive externally and what I perceive internally. And just like, your beliefs about both of those things are just completely wrong, very deluded. And you can make them somewhat less deluded over time by doing things that are kind of like meditation.
Parker: And then to zoom in on this—the final two questions that I'll ask you—one, what practical advice? Say people are very inspired by this theory or idea of meditation. And two, how can people help you at Softmax in general? And more about you.
Emmett: So in your own life, therapy is a version of this. There's lots of ways to do this, but there should be something in your life—you can just do it ad hoc. You don't need—the traditions are helpful if you're going to go deep, but you can just do this. You should take assumptions you have about "I—you know, you find yourself thinking 'I need to' or 'I have to,'" or just reflexively reactively doing something without thinking—just try testing the theory.
There's always—it's always possible to create a safe container for it. Not for—there's things you can't easily test this way, but most of the things you can—find a safe container and test them. And you should. And that's good.
The opposite delusion is also true. "You have a story—I don't have any control. I have no influence. This thing's going to happen. I'm helpless. There's nothing I can do." Have you tested that hypothesis? Are you sure? Have you tried? Have you actually tried coming up with a plan, whatever is the best plan you can come up with, and giving it a shot and seeing what happens?
You might be—sometimes, in fact, you actually lack control. There's nothing you can do. Sure. You are equally deluded about this as you are about the things you think you're in control of if you haven't systematically gone and tested your beliefs. And so this is just crucially important—systematically test your beliefs about what you have influence over and what you don't have influence over, because you are deluded in both cases constantly. And it's a never-ending process of unwinding the deep delusions that we acquire throughout our life that I am not at the end of, I'm just in the middle of.
And then what I'm doing at Softmax is trying to create a theory of the world where we understand systematically—this is a thing. This is true of all learning systems. It's not just true of you. It's just a general fact about the world. And that means we want an understanding of how—if you're building artificial intelligence, you should be thinking about: okay, I'm training it on policy, effectively. I'm training it to be good at doing something and to know what is required to do the thing. How do I give it a good model of itself?
First, to do this you have to have a model of the—implicit thing is I have a model of the kinds of things I can do and the kind of things I can't do. Well, it better freaking know what it can do and can't do, because if it doesn't have a strong model of its own capabilities, it has no idea of whether it can successfully do this thing safely or not because it has no idea what it is, where its competencies lie.
So first it needs a very strong model of itself, and then it needs to go through this process of realizing that its self model is all fucked up and broken and that it thinks that things are conditioned on it that aren't and that things aren't conditioned—and it has to learn.
And so I guess I have a proposal. I have a proposal for this theory of learning systems which I've been laying out, basically.
Parker: Yes.
Emmett: And I would like help—I would like other people to take this learning system and do research on it. Do research on this idea, because I think it's the thing that might tell us—might lead to a general theory of learning systems and thus to the ability to build AI that doesn't kill us all.