conversation as a game: Nonlinear Function
Created: February 07, 2022
Modified: February 13, 2022

conversation as a game

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Okay so there’s a lot of research on what conversations are, what the goals are (of course I don’t know most of this research…). It seems as though in many cases we ought to be able to get machines to converse well by applying something like MCTS to a well-defined game..

Argumentation, where the goal is to win an argument (defined how?). Argumentation exists at the level of basic conversation, or in larger settings like politics. It’s a tree in that your argument will have multiple points, each of which can be challenged and can be defended by going down into the weeds until you get to base-level facts -- but you have to balance the need to do this with the need to avoid getting lost in the weeds, and to focus on only the key points of disagreement, or on framing the argument in the way that will be most persuasive.

http://www.mit.edu/~irahwan/argmas/argmas14/w12-07.pdf

Negotiation: goal is to come to consensus on a deal. Here the final outcome is (in simple settings, like haggling for the price of an item) quantitative and you can measure / optimize it. https://arxiv.org/abs/1706.05125

Planning: two agents are trying to come to a shared agreement on a plan that they will both execute. For example, a couple talking through how they’ll spend the day: who will pick up the kids, buy groceries (and what to buy), call the babysitter, make dinner reservations, etc. Or a group of friends planning a trip together, or a military unit planning an operation. Or a group of professionals designing a product or planning a strategy. In general the plans that can be arrived at through conversation are high-level plans: more like strategies than sequences of concrete actions.

Getting to know each other: in some settings, agents are trying to teach each other about themselves and learn each others’ ‘reward functions’ (interests, background, goals). This includes dating, meeting a potential friend for the first time, professional networking, job interviews, etc. One way to share reward functions is literally to ask, ‘what would you do in this situation’, or to talk about past experiences and how you did there. Possibly more important than the pure reward function itself is sharing conceptual frames: the abstractions people use to think about the world. These are, roughly speaking, the interior layers of a neural net: the feature representations on which a linear reward function can be built. Such representations are communicated even through simple word/bigram choice, references to literature or pop culture, by the aspects of a situation one chooses to highlight or foreground.

Banter / improv: people (/agents) build an imagined world together. They tell a single story, respond with “yes and”, try to surprise or delight each other. They are cooperating to ‘play’: to maximize each other’s agency, to learn together about how each other thinks and what their mental style is. As opposed to planning together, this is something like an MCTS rollout: you immediately sample from a network a bunch of times just to get a sense of where you end up.

Teaching: One agent has information they want to convey to another. This might be literal factual information (/declarative knowledge), or it might be a skill (/imperative knowledge): teaching someone to program, or to play an instrument, or to do math.

Therapy: One agent needs to process their own goals, thoughts, emotions, plans, desires, relationships. They need to do their own ‘rollouts’ in a sense: to try taking vague thought vectors and expressing them in words, so they can learn to isolate the important insights, come to actionable realizations, better align their ‘system 1’ with their ‘system 2’, or even just practice expressing themselves in words so that they build confidence that they can do this in other conversations when the need arises. Meanwhile the other agent is modeling the first person, asking probing questions, trying to direct them towards areas that might be most fruitful, comforting them or giving advice or suggesting interpretations. The conversation is directed solely at the first person’s utility function -- but even something more limited than that, maybe a proxy utility that’s something like ‘self-understanding’ (which we assume ultimately helps real utility).

others?

Of course none of these are separate. Real arguments and negotiations partly require getting to know the other person, to understand what their true goals and desires are and how to frame arguments. Therapy and teaching can involve arguments. Ultimately, a person is never engaging in any of these forms of conversations in a vacuum -- they are trying to achieve some larger goal.

What does an architecture for achieving conversational goals look like? At a low level, the action space is words. But there are more abstract actions: points to be made, questions to be interrogated, emotions to be evoked.

In an everyday conversation, it’s usually important to “think fast”: to respond immediately and authentically. But in written conversation or professional dialogue, we sometimes have the opportunity to “think slow”: to plan out possible reactions to our statements, to anticipate and address counterarguments, to edit

Importantly: when we’re planning a conversation, we typically don’t roll out the exact words someone might use. We think about general points they might make: “why not ”

Algorithms

The brain probably doesn’t use MCTS as such. It probably doesn’t have an explicit tree data structure, or the notion of explicit action rollouts in a simulator. We do have some kind of heuristic network (/policy) that proposes (maybe even just samples) which paths to consider. We don’t roll things out to the end: when playing chess, I never consider a full set of moves to completion. We think forward a few steps and do some kind of TD-learning.**