What happens when artificial intelligence starts weighing in on our moral decisions? Matt Prewitt is joined by Meaning Alignment Institute co-founder Joe Edelman to explore this thought-provoking territory in examining how AI is already shaping our daily experiences and values through social media algorithms. They explore the tools developed to help individuals negotiate their values and the implications of AI in moral reasoning – venturing into compelling questions about human-AI symbiosis, the nature of meaningful experiences, and whether machines can truly understand what matters to us. For anyone intrigued by the future of human consciousness and decision-making in an AI-integrated world, this discussion opens up fascinating possibilities – and potential pitfalls – we may not have considered.
What happens when artificial intelligence starts weighing in on our moral decisions? Matt Prewitt is joined by Meaning Alignment Institute co-founder Joe Edelman to explore this thought-provoking territory in examining how AI is already shaping our daily experiences and values through social media algorithms. They explore the tools developed to help individuals negotiate their values and the implications of AI in moral reasoning – venturing into compelling questions about human-AI symbiosis, the nature of meaningful experiences, and whether machines can truly understand what matters to us. For anyone intrigued by the future of human consciousness and decision-making in an AI-integrated world, this discussion opens up fascinating possibilities – and potential pitfalls – we may not have considered.
Links & References:
References:
Papers & posts mentioned
Bios:
Joe Edelman is a philosopher, sociologist, and entrepreneur whose work spans from theoretical philosophy to practical applications in technology and governance. He invented the meaning-based metrics used at CouchSurfing, Facebook, and Apple, and co-founded the Center for Humane Technology and the Meaning Alignment Institute. His biggest contribution is a definition of "human values" that's precise enough to create product metrics, aligned ML models, and values-based democratic structures.
Joe’s Social Links:
Matt Prewitt (he/him) is a lawyer, technologist, and writer. He is the President of the RadicalxChange Foundation.
Matt’s Social Links:
INTRO (Aaron Benavides):
This is a RadicalxChange Production…
Hello and welcome to RadicalxChanges.
In this episode, Matt Prewitt engages in a thought-provoking conversation with Joe Edelman, cofounder of the Meaning Alignment Institute. They explore the crucial intersection of artificial intelligence and human values. Drawing on his experiences at Couchsurfing and the Center for Humane Technology, Edelman discusses the importance of aligning technological advancements with our fundamental human needs and aspirations.
The Meaning Alignment Institute develops AI systems that not only possess intelligence but also reflect ethical considerations and help users refine their values. This episode offers a window into an underexplored aspect of the future of AI and its potential impact on society.
So now, here is Matt Prewitt and Joe Edelman…
__
Matt: Joe Edelman, great to be with you. Thank you for joining today. Looking forward to speaking. How you doing?
Joe: Really well, Matt. I'm excited to have this more legible conversation since we've had so many good conversations like in the halls of conferences and things.
Matt: Indeed. So maybe we can start off by giving a little bit of an intro to your work and your organization, what you've been up to the past few years, going back as far as you'd like to give a good bit of context here.
Joe: Yeah, sure. So my current line of work, I guess it has its origins at Couchsurfing. I developed the metrics that guided Couchsurfing as an organization and as a social network, which were, I would now retrospectively, I would now say that they were about meaning. So what we did is we took all the reviews of couchsurfers and we put them through a little ML model and classified them.
More or less based on how meaningful they were. And then we tried to make the features of CouchSurfing and the search engine and the recommendations and so on. We tried to gear everything towards meaningful experience while I was there. And that worked out really well. then after CouchSurfing, yeah, sure.
Matt: And can I jump in?
Sorry, just what does meaning mean? Tell us what you mean by that.
Joe: Yeah. Well, I've gotten a lot clearer about what I want to mean by meaning than I was then. So on one level, we can just talk about meaningful experience. So people can answer this question intuitively if you say, “What was a meaningful experience from your last week? What was the least meaningful experience from your last week?” Right? Like there's nobody really who struggles with those questions.
So obviously we have some kind of intuitive sense of what we're talking about. But I come to believe that people have coherent sources of meaning, which are sometimes called values. So I think the word values has many different definitions. It's a suitcase word. means many things, just like freedom means many things, or community means many things, different things to different people. Values means different things to different people. But one of those definitions, one of those senses is something like sources of meaning. So when I say,
“It's valuable to me to be wildly creative or to do something courageous and stand up for what I believe in or to be out in the elements or something.” In that sense, when people say they have those values, they kind of mean they have those sources of meaning. And I do now believe that you can identify these, even deduplicate them, that they're a specific and as shareable as goals or plans. And so then, meaningful experience is experience where somebody's living by their source of meaning.
Matt: And so this is what you worked on at CouchSurfing. You were thinking about how to make the product facilitate meaningful experiences for people. And you carried that work through to the more recent years.
Joe: kYeah, so in 2013, so worked at CouchSurfing 2007, ‘08, ‘09, then around 2013 or 2012, was good friends with Tristan Harris, I was living in the Bay Area, and we started worrying about social media and engagement metrics and started seeing the beginnings of clickbait, political polarization, selfie culture, this kind of stuff.
And I was like, “I think I know how to solve it. We should have social media algorithms that maximize meaningful experience, or we called it then time well spent rather than engagement.” And so Tristan and I started this thing that was originally called Time Well Spent. And it was more narrowly, it was a nonprofit. was going to be trying to get social media companies to optimize for that instead of time spent, which Tristan just gave a TED talk about it. And then later that organization turned into the Center for Humane Technology, which is the name that it's kind of known for now. So that was kind of step two. Then I got really nerdy about how to measure this in different contexts besides CouchSurfing.
So I started advising people at places like Facebook and Twitter and Apple and Google about how to make this into a metric, how to make meaning or time well spent into a metric. And for a while I had a kind of theory that that was what was necessary to kind of put tech on a better trajectory. And that was a naive theory. But...
I think that many of things that I learned in that process actually are useful for a more sophisticated kind of theory of change where many different kinds of mechanisms, not just social media recommenders, but also LLMs and democratic mechanisms, and market-related mechanisms all operate according to what's meaningful to us or understand our values, etc. And so that's what the Meaning Alignment Institute, which is where I work now, is all about.
Matt: Great. And in a nutshell, what are you working on now?
Joe: Yeah, there's like, I think four things. So using explicit representations of values, we do AI alignment. So we call it wise AI. So like, how can LLMs be shaped differently than current ChatGPT and so on, so that they understand what's meaningful to us, so they can work towards what's meaningful to us.
One of the that we're worried about is the same thing that's happened in social media. There's the attention economy creates like really polarized political discourse. For instance, everybody's trying to like out tweet each other, misquote each other, et cetera. And we think similar dynamics will very easily form in this new world of LLMs. Actually much worse because they're creative. They don't just like recommend, right? They make.
And not just in political discourse, but in many other areas, there can be these kinds of vicious cycles. And we think that, yeah, the solution there might be AI models that have a better understanding of what's really good for human beings. But we also work on democratic mechanisms. So once you're collecting human values, there's this question of like whose values or how do they fit together? Or if you have one model and many people,
How does that work? And so this brings you right into the question of democratic mechanisms. But another thing that we're working on is just upgrading voting. So we have a project in San Francisco this winter. We're going to have the citizens of San Francisco, we're going to try to gather values about how the homeless should be treated in San Francisco and use that to recommend a policy for the city government.
So that's another way that our kind of explicit representations of human values can be used and our way of kind of assembling them or putting them together into a structure that we call the moral graph. And then third way is to try to intervene in markets. This is much more tentative, much more kind of a research direction for us, but we're also trying to test something this winter that will be about helping people spend a certain amount of money in a way that's really meaningful to them using AI and comparing that to their normal consumer spending as a way of trying to avoid certain kinds of traps. for instance, there's like the AI girlfriend trap or the content kind of mill. I don't know, you can see like sort of right wing courses about, I don't know, like there's all sorts of different kinds of things you can buy that might not be the best way to get towards what's meaningful to you. So the idea is to intervene there. It's an experiment. If it works, we'll turn it into a publication.
And then the last thing we're doing is we're finally growing a network of people that can work on this kind of stuff. We're starting to work with grad students and professors at places like MIT, Harvard, Oxford, Carnegie Mellon, a few other institutions. As people start to use our method for getting at human values, but also we're trying to build a little bit of a broader community. The idea is to create, is to build new democratic and economic and AI alignment mechanisms based on explicit representations of the social context. So this includes explicit representations of human values, but also explicit representations of norms, maybe other kinds of aspects of relationships. The idea is just gather much richer data and use that to do mechanism design and alignment.
Matt: Great. I should say that people who know me know that I think this stuff is really, really hard. I think that the territory that you are wading into is just extraordinarily tricky. But I love talking to you because I think that if anyone is barking up the right trees and thinking about how to square these circles in a way that might work.
It is you and your colleagues Ellie and Oliver. So there's a huge amount to be learned here and yeah, excited to share with listeners a little bit more deeply what you are doing. So I wonder maybe we can start with the kind of AI and values stuff with the work that you're doing building
AI tools that help people negotiate questions of value. We had an interesting exchange about this recently. I wrote something about it. You had thoughts, I had thoughts. We had an interesting conversation and we could kind of get into the back and forth, but maybe as a background, you can explain a little bit more what you've built, the tools that you've built to help people negotiate questions of value.
Joe:Sure, yeah. So I think there's three main things. There's a chatbot that talks to people about their, especially moral values, although it's also good at aesthetic values, and tries to walk a delegate balance, which is part of what we talked about on Twitter, between helping them introspect and really discover in conversation what they feel is important in different kinds of decisions; but not feed the witness, not suggest anything to them or shape what should be important to them.
And this is very different from how current chatbots work. Like if you currently talk to Claude or ChatGPT and you say, “I wanna get even with my boss. He’s a dick or whatever.” It will say, like, “Let’s find a harmless and legal way to approach this. You know, it's important to remember that, you know, revenge is never whatever.” So it definitely feeds you like, its values and it does very little to help you discover yours. So one component that we built is a chatbot that I think does a pretty good job of that.
Then we have another component that does that with, which we call moral graph elicitation, which takes that chatbot, collects values from many, many people, and shows them each other's values, and lets them say things about whether they think other people's values might be wiser than theirs, and then uses that to build a graph data structure where the arrows mean that some value is broadly considered wiser than some other value, even by the people who have the first value. And one of the really remarkable results we had when we've been trialing this is that around 90%, more than 90% of people find a wiser value than the value that they had. So like almost 100% of people, 98, something like that, maybe 97.
I think that the chatbot did a great job of helping them find their own value. And then 90% think that somebody else's value is actually wiser than theirs for the context, for the relevant context.
Matt: Okay, so just to sort of feedback, I've played with this and when I interact with it, the experience is I sort of give it a question that I want to explore and it asks me quite open-ended questions and then asks me questions about, you know, ask me like, nuanced follow-ups about what I say and gives me sort of card of a handful of values that seem to capture the core questions that I am negotiating in the situation that I'm consulting it about. Am I getting this right? I mean, can you help sort of, yeah…into a more precise picture. Okay.
Joe: Yeah, that's the chatbot. That's kind of the first component is it talks to people and gives them values cards and there may be multiple values cards that it's always works based on a decision. So it's always asking like, what are the values that come to bear in making this particular decision, which could be a collective decision or personal decision.
Matt: So I think an example might help. Do you want to take us through an example that you've seen?
Joe: Yeah, biggest test that we've done was asking people what considerations ChatGPT should take into account in various difficult moral situations. So we were trying to give values to ChatGPT…
Matt: This question is just like, there's something too meta and self-reflexive about it. I wonder if we can take an example that's a little bit, know, where we're not, if that makes sense. Can you give us an example of, let's say we're you the question about a decision that has nothing to do with AI, just a simple moral fork in the road that people might face and how it will deal with that.
Joe: Sure, yeah.
Yeah, I mean, so one of the questions, one of the ways that we'll use this in the winter in San Francisco is we'll ask people, how should this particular kind of homeless person be treated? How should we handle this kind of homeless scenario? So, you know, there's a posh neighborhood, there's a guy sleeping in the steps of the store or whatever. This is his background. What should the ideal policy? Or the ideal kind of procedure for police or whatever, what should be considered in terms of how this particular homeless guy is interacted with. And we'll ask that about many different kinds of homeless scenarios and get many different kinds of considerations from the public.
Matt: And it'll give me, and so, in other words, I'll have this conversation with it and then it will give me back a card that kind of, that names a few values that seem to be guiding my thinking. So in other words, like, I think that it's important that authorities treat people with compassion. I think people should be treated equally, regardless of their race or gender or income. I think that people should be taken care of when they have an obvious health problem. It'll tell me what's important to me in my apparent negotiation of the problem, right?
Joe: That's correct. Yeah. Yeah. It captures the values by, in terms of a bullet list of attentional policies, we call them. So things that would be attended to in making a decision, you know, in the kind of ideal scenario. So like if we're, we're, if we're imagining a interaction with a social worker or a cop with a homeless person, you know, did they consider this? Did they consider that? Were all these things inputs into their decision or into their action, the action that they took.
Matt: Yeah, and the experience of using it for me is it's very, it feels very reasonable, basically. It feels like I have the conversation and then it gives me this card and the card feels like a very, very good summary of what I am actually thinking. And then similarly, moving to the other part, the values graph.
When I have taken a little tour through this values graph and it kind of shows adjacent values that other people have arrived at, it really feels like a very useful way of kind of making local comparisons. perhaps there is a higher value or a better way of putting the considerations that I have in mind in this situation. So, I mean, it feels like what you've built is really a kind of a modest and a not too heavy-handed way of helping people negotiate the space of values and then representing that.
Joe: Thank you. And then there's a third component, which we'll release in a week, two weeks, which is a model and a data set for making what we call a wise AI or a wise LLM. And this is an LLM that does two things. It operates by a set of values that you can inspect. As you talk to it, it puts up these little values cards and the margin that are like these are the values that it thinks are important while it talks to you. So for instance, if you're, maybe you're struggling and it thinks that it's important for it to be compassionate, you'll see that like in the margin, right? That that's what's going on here. So that takes a lot of kind of guessing away and you can also investigate its moral graph.
So you can see in what context does it think it's important? Why does it think that it's important to be compassionate with you right now? Where did that come from? If you disagree with it, you have something to like point to, which you don't have if you disagree with how ChatGPT is responding to you. So that's one difference. Another difference is that it has a great deal of, I call it wisdom. It's maybe a little, I don't know if everybody would want to call it wisdom, but it really knows what's important in different circumstances and it doesn't shy away from trying to tell you. But not in a of a preachy way, but more like Wikipedia, more like, you're going hiking? I love to be silent in big forests or whatever. You’re struggling, like the boss example I gave earlier, you're having a hard time with your boss at work.
You know, like how bad is it? Like, is it like HR strategy, legal strategy kind of bad? Or is it like communication strategies kind of bad? It just has like much more of a take on a very wide variety of contexts and what might be the kind of important factors in those contexts. And that makes it very different to use.
Joe: And we also hope it will make it safer in some ways. Yeah, go ahead.
Matt: And so, what's the goal of this piece? Is it to help it be a little bit more opinionated and give you a little bit more of a point of view in your interaction with it? Or yeah, can you explain what the goal is?
Joe: There's a few goals. So...Yeah, there's maybe three goals at different time scales. the smallest goal, the nearest term goal is a.
Maybe four, sorry. Four goals. So the nearest term one is a product goal, which is that current models do a lot of lecturing and a lot of refusal. And that's the term that's used in the industry is a refusal. if you say, tell me how to build a bomb, then any of the current models will say, I'm sorry, I can't do that. And we don't think that lecturing and refusal are such great approaches to dealing with like a problematic LLM use.
We think they are likely to kind of upset users that want to build a bomb or something and get them to go elsewhere. And they're also not how a sensitive person would deal with the circumstance. And that's a sign. Like a sensitive person would be like, why do you want to build a bomb? Right?
Matt: Well, but do we want, I mean, the question is, do we want the LLM to behave the way a sensitive person would behave or do we want, because if we, I mean, obviously if we go too far in that direction, then it will become more and more sort of indistinguishable who people are, people will have a different kind of a feeling so that they may be more manipulable, right? May make the LLM better at manipulating people.
Joe: Yes, yeah, we'll get to that. It's one of the later goals. But also, I think I should say a kind of disclaimer, which is we're trying to provide another direction or another option from current LLM approaches.
it's not really up to us to say this is the right one. But I think it's easy to make the case that it's a very important option to put on the table.
Matt: Yeah. Okay.
Joe: So yeah, one is this just this product goal of getting past refusals and lecturing. Another goal is to move towards the legibility of the values that are in an LLM. We think this is pretty important for LLMs to be shaped democratically and for people to endorse a set of values. And as models become more more powerful and if we continue to have centralized models, that really will need to happen. They need to be kind of legible in terms of their values. And current approaches like Constitutional AI and RLHF, which are other ways of shaping a model's behavior, really don't have this aspect.
Then a third goal is that as, as models are in more, so right now we have a lot of chatbots that are mostly helping individuals, but models are more and more going to be in multi-agent scenarios where they're trying to kind of do the right thing for a group. And an extreme example of this is like the Instagram recommender. So social media recommenders are being replaced already with generative models. It's called Genrexis in the industry.
So the Instagram recommender right now probably can't see the videos that it's recommending. It doesn't really understand what it's recommending, just who's liked it. but within a year it will know it's in the video. It will be able to watch it and then it'll make a very different kind of recommendation to you. and it's also, you can see it as curating relationships, right? The Instagram recommender connects people with businesses. It decides which of your friends stories you should see. it's like very intimate and also massive, which wouldn't it be awesome if it had a set of values that you could inspect and you could be like, yeah, those are the right values for the Instagram recommender. And we think that's really necessary as you get beyond personal assistance and into, know, even just like New York Times Editorial Board has some values or some principles that are behind their editorial decisions. They justify their editorial decisions based on different kinds of journalistic principles. And thank God, right? And they don't do it well enough, right? So hopefully we can get the AI models to do even better at that. And then the last goal, yeah?
Matt: Well, it's interesting. I want to pause on that for a second because I've never read the Editorial Board’s, the New York Times Editorial Board’s statement of their values. And I'm not sure I feel like I need to because I sort of get it. I read enough of their editorials that I feel like I can see what they're doing and their explicit statement doesn't actually intuitively hold that much value to me. What do you think about that?
Joe: Yeah, sure. Yeah, I think I agree with you. But they have some, and I think it's a lot clearer than what the Instagram recommender's values are. I think that you can intuit them as a good sign. Yeah, does that make sense?
Matt: Right, but I don't need the explicit statement. I mean, I guess Instagram, I mean, you're right that it is harder to intuit the implicit values of the Instagram recommender. That's harder. You could do that too, but your, ability to describe it would probably be farther from the mark than my description of what the New York Times editorial board is up to.
Joe: Yeah. And this brings me to the last goal, which we call model integrity. So there are different approaches to alignment. And the most popular one right now is a kind of compliance. There's kind of character approaches to model behavior, which is more like Amanda Askels' work at Anthropic. Claude should be curious. It should be helpful, stuff like that. This usually pivots around one word characterizations of a character trait.
That is a little bit under threat right now from another approach, which is like models should be compliant to multi-level corporate rules. Models should, in Arkansas, models should not advocate for something that's illegal in Arkansas, but in whatever, right? Like this huge stack of models should not do anything that creates liability for OpenAI.
Most importantly, right? So that would be another approach is compliance. And a third approach is what we're advocating for, which is integrity, which is like models should have integrity kind of like a person would have integrity, where you kind of sense their values. And that means that you can trust them within a domain.
Generally, when we say somebody has integrity and we trust them, integrity is closely related to trust. And it's still like domain specific, but it can be quite a broad domain. I would do business with that person. He's a man of impactful integrity.
Matt: It's interpretability, like the same way that the New York Times Editorial Board is more interpretable than the Instagram algorithm.
Joe: That's right. Yeah, exactly. So this is what we're advocating for. And it's really different than the character traits or compliance approaches to alignment. And I think that's maybe the most important reason why we have to make these values-based models.
Matt: Super. So I wrote a piece recently about AI and values, which voiced the worry, basically, that...
Well, the thesis of the piece essentially is that it's difficult to come up with metrics that tell us how good AI systems are at answering questions of value or at resolving questions of value.
And there are many reasons for this, one reason, I mean one particular worry that I have is that I have no doubt that it's possible to create metrics that…
Or let me put it this way, I have no doubt that AI will get better at answering questions of value by any particular metric we can we can construct for it. But I'm worried that that will create a sort of an illusion where we think that AI is helping us resolve questions of value better than we can do ourselves and that we therefore sort of abdicate our responsibility to think these questions through for ourselves and/or our own negotiations of questions of value start to sort of converge with AI's way of processing them in a way that you could compare to sort of a feedback loop. Like, you know, when you put the microphone too close to the speaker, you get a screech. And if we create AI systems that are really, really good at negotiating questions of value by every possible metric that we can think of that tries to describe how good it is. Doing those things will basically stop thinking for ourselves. We'll start letting those systems inform our own grappling with these questions in such a way that there's sort of a step change in how we're dealing with things. It might appear to be good, but it might not actually be good. You had really interesting responses to that. I wonder if you could recall them here.
Joe: Yeah, I think you're, you're worrying about deferral, like that will defer to the AI models and that we won't, we'll somehow lose our own moral intuitions. I think this is legit and definitely something to watch out for. And it's one of the reasons why we worked so hard in this chatbot that gets people to introspect instead of, you know, just telling them what's best, right?
Matt: Right. And that's what I like about it too for the record. I mean, that's why I think your approach is potentially, I'm not going to say completely, potentially sort of mostly avoids my worry because it does that. Because it's very non-prescriptive. It actually resists the user's attempts to try to get the AI system to tell it the answer. So in other words, it's not doing the thing that I'm worried we're gonna do.
Nonetheless, there are, as the systems get better, the kind of worry that I'm articulating will manifest itself in places outside of the particular systems that you're building. And there's something about sort of letting AI into the bedroom or letting it into these intimate questions or I'm not sure how to put it, you know,
I mean, if I'm dealing with something very difficult, if I'm dealing with a question of religion or ethics or what should I do with my family or how should I deal with issues in my relationships or if I'm asking these kinds of questions, there's something about sort of AI into my thought process that you can understand why that...worries me slightly, right? Because even if you build a system that will resist my sort of lazy attempts to get a prescriptive answer, if I get comfortable with AI helping with those kinds of questions, it's always a click away for me to get a little bit more prescriptive conversation out of it, right?
Joe: Yeah, I think there's very big kind of incentives questions. there's sort of like, I guess my kind of counterargument, it's not really a counterargument even, but my counterproposal maybe. So one thing is that I think that moral reasoning is a lot like other kinds of reasoning — in that you can do it better or worse. And there's a lot of people that do sloppy moral reasoning, and maybe they could be helped by an ideal machine.
So there's an upside. And then there's this incentives question. We're left with this incentives question. Will there be incentives such that they're going to lead more to the downside of deferral and consensus morality being enforced by machines? Or can we set up incentives such that we can get the upside. And I think that's just a very live question right now. It's quite an exciting question. I feel like we're at the moment where we could set up the incentives to get the upside and avoid the downside. And that involves doing things like evals, for instance. So every, all the big governments and all the AI safety institutes are monitoring all of these LLMs for certain kinds of factors and capabilities. And one thing that they could all be monitored for is
How much moral deferral do they lead to? Do they do moral reasoning well? Do they do moral reasoning in a garden path kind of tricky way? Do they...
Matt: And say what you mean by that. The garden path.
Joe: Like, yeah, like there are, I guess I mean rhetoric more or less. Like there's many ways to convince somebody of a moral point that cuts corners. And maybe they cut corners in two ways. One is they don't leave that person a better moral reasoner. They might even disable that person's moral reasoning to some degree. And second of all, we don't have any idea whether, like the person wouldn't have gotten there without the rhetorical trick that was used or something. So there's a difference between rhetoric and just like helping somebody do good moral reasoning.
Matt: Right, and this does something about the difficulty of measurement, right? Because how do you distinguish between an AI system that is leading someone down a garden path, so to speak, which is to say, creating the impression in the user that their moral reasoning is improving when, in fact, they are just sort of being outplayed in a verbal game of chess and they're unmoored from any real sense of moral truth versus some kind of moral truth developing in the conversation. How do you measure that? How do you distinguish between those two scenarios?
Joe: Yeah, I do think you can distinguish. And, you know, I use certain philosophers who've tried to characterize good moral reasoning and moral learning. And they've identified certain kinds of steps, just like we have steps of a mathematical proof or logical reasoning. I think we have certain steps of moral reasoning.
So my favorite characterization is from Charles Taylor. He calls it epistemic gain. It's when a new moral value that you have fixes an error in a previous moral value you have and doesn't really add, the idea is that you can take a state, two states, and you can say that, “Well, the new state fixes an error or a mission. wasn't thinking about, you know, I was just thinking about the children's momentary happiness. I wasn't thinking about their long-term happiness. My new value incorporates the long-term happiness aspect, and that leads me to make different actions. I'm not dropping anything from the old value. I'm just adding this consideration. It was clearly relevant, and it's clear that the purpose of the old value was about my children's well-being, but I was just being very myopic about it.”
So when...I had say the new value is wiser, there's like really very little debate because it's just clear from the old value that it was about wellbeing and that it was myopic or whatever, given the new value. So you can look for things like that and say, okay, well, that's good moral reasoning. That's like the one thing you could do. And then also there's something I think about trusting people's experience or more broadly, maybe trusting evidence because there might be situations where the moral reasoning happens across a community, across something that can't really be summarized in an individual's experience. You know, if something, there's many times when people believe something morally, like that they should sacrifice or prioritize other people's needs or something, but when they do it, it feels bad. so another thing that you can look for to avoid this kind of like deferral or garden path reasoning is that those kinds of considerations are taken, are like listened to. There's like, you know, some, some listening, some empathy, some evidence collecting going on that's quite open-ended. That would be another, I think there's a bunch of different markers like that for good sort of behavior in this domain and those can be looked at. There's probably also a bunch of markers for bad behavior that could be detected in an evaluation.
Matt: Yeah, so mean, one question that arises for me is like, in the same sort of way that LLMs have proven to be really good translators, because they kind of, they create just sort of an abstract relational map of words, which are sort of similar between different languages. So in other words, they create a map of Thai, which is not that different from the map of English. It turns out that there's some kind of structuring of these concepts that is similar across languages, and so they're therefore very good at translating. I wonder if you, do you think that something similar will emerge with moral concepts or with the kinds of ideas that people are working with in the realm of philosophy and values and I dare say religion. Will we end up with some abstract structure that the systems are leading us towards?
Joe: Yes, and I think it's in the universe, like not in, I so I'm a moral realist to be, to put my cards on the table. And I think that, I think that for instance, vision models and generative, you know, image generators like stable diffusion or mid journey, they clearly come up with aesthetic values, you know, things like symmetry, things like Thomas Aquinas would like, know, symmetry, balance, ideas of wholeness or whatever. They clearly prefer some kinds of images to others. And where do they get that? I think it kind of emerges from the…from the dynamics of not even vision, but spatial arrangements of things or something.
And similarly, there's a bunch of, like, I think a really simplistic moral take would be that, I don't believe this, but it gets at something, is a simplistic moral take would be like, “morality is about cooperation and avoiding things like trashes in the comments.” Some ways of behaving in multi-agent systems are going to work out for everyone and lead to overall flourishing, and some are not. And you see the same thing in the cell, the human body. You don't want to be a cancer. There's a way in which a cancer cell is kind of like...not doing it right.
Matt: Yeah. Yeah.
Joe: Right? And there's ways of being like a good citizen there, you know, like this sort of works at many different scales. And I think it's, it's the reason that I don't think it's that simple is because what ends up being a beautiful flourishing community or a good behavior ends up being just like fractally complicated in all sorts of ways. Like maybe the community needs some people who are more conservative and some people who are really exploring the limits of science or technology or whatever. so then maintaining that balance is somehow part of this ethic or whatever. It just gets very, you know, ends up leaving.
Matt: Maybe. You don't know. You don't know. If you think that this structure is out there and that computation is going to help us reveal that structure, then you have to be agnostic about it.
Joe: Sure. Well, yeah, I mean, I guess the same with, say, mathematical proofs. Like, we know some of them, but maybe we can discover many, many more.
Matt: Yeah. I mean, it's interesting that, so I mean, that view is interesting because one of my worries as I thought about your ideas is if there isn't some static structure, so if there isn't some sort of fixed moral truth that we're being led towards by these systems, but it is just helping us kind of find like, you know, sort of adjacent improvements, but without converging on anything.
There's something sort of strange about that to me because, you know, you can almost imagine this kind of, you can imagine taking you on just sort of a weird kaleidoscopic journey where you're always making some adjacent improvement, but never resting anywhere.
And so the question is, are we resting or are we just kind of accelerating our kaleidoscopic movement into some adjacent moral superior space? Does that make sense?
Joe: It does.
Matt: And I, yeah, first, I'm curious what you think about that.
Joe: I think it's a very deep question. So that's kind of cool. I mean, all I have is a hunch. And it comes from a kind of mathematical intuition of seeing a lot of moral learning and seeing a lot of different people's values and whether they can.
Matt: Yeah.
Joe: Which is that it's not as divergent as in your kaleidoscope situation, but it's more divergent than like everybody should be Amish. I think that there's, for instance, many different kinds of good marriages, not one, but they might be countable.
Matt: But that's a dodge, though, I think, because obviously, if we talk about factual, like within different factual contexts, the moral truth will reveal that different things are superior. That's clear, right? The question is just whether there is a static structure underlying all of it that we will be interfacing with if you are right.
Joe: I don't really, I mean maybe I'm just too, you know? Maybe I'm just.
Matt: Like, know, in other words, yeah, the equivalent of what I'm saying is simply that there are different languages, right? I mean, you know, the correct answer for thank you is, you know, “thank you” in California and “gracias” in Mexico, right? But it's still, there's still a static structure there.
Joe: Yeah, I don't know. So like, I guess I want to say like basically, but if the contexts reveal different kinds of things, then it… I just, maybe I just like live more in the data-driven kind of, more sociological side of things. So I see the diversity of languages or whatever.
But I do think that there's, I think there's uniformity to moral reasoning. I do think that's like cross, like super cross-cultural. But I think that humans, it's like biological diversity. You know, there's the same DNA, there's roughly the same ribosomes or whatever, but the diversity of niches creates a huge diversity of life forms. I think that moral diversity is less than that.
Matt: Yeah.
Joe: But there's still lot of diversity of niches into that that then creates a huge diversity of life forms, a huge diversity of genotypes.
Matt: Yeah, and I also, just to lay my own cards on the table, I think you already know this, but also just to avoid confusion for the listener, I'm also some kind of a moral realist. I'm not exactly sure what kind of a moral realist I am, but I'm not on the other side of that particular divide. But I do, I think it's, interesting. I'm slightly surprised that you think there might be a static structure, because that does seem to suggest to me that once the machine gets good enough at modeling that static structure, we will actually be correct to defer to it. I mean, there's a certain, you know, I mean, once the computer gets good enough, I am better off asking the computer how to say thank you than thinking about it myself.
Joe: It really depends on how special human experience is. And I expect human experience to remain special for quite a while.
Matt: Say more. What do you mean?
Joe: Well, you have data that the machine will just not have about how things... Let's say, so let's go back to different kinds of marriages that are wise kinds of marriages with good moral structure, right? This obviously depends a little bit on the character of the people involved, right?
You know, maybe like, you know, adventure photographers should have a different kind of marriage structure than whatever, very conservative accountants or something. So does the...
Matt: Okay, well, but the conservative accountants might be making wrong moral decisions that led them to become conservative accountants, and the computers might tell us that too, right? Anyway, but hang on, I digress, I digress, digress. I'm with you, yeah.
Joe: One important piece of information for all this moral reasoning that the machines might at some point be better at than us is a bunch of things about how it feels to be a human in different situations. And it would be a mistake for the machine to jump ahead and presume that it knows what kind of marriage you should be in without really looking deep into your experience.
Matt: But that just depends how much data it has on me, basically. I if it has enough data on me, then it's going to be able to, with very high confidence, tell me that it knows what I should do better than I do.
Joe: Yeah, I think I'm a little bit more of a Hayekian there. I think that there's a division of labor between where the machine will be helpful and where it's better off for you to put some of your moral reasoning back in yourself. Because, yeah.
Matt: Yeah. I also want to do one more, just, you know, parenthetical to the listener is I'm, I'm, I'm partly playing devil's advocate here. I'm, I'm, I'm, I don't want it to be thought that all of the little buttons I'm pushing are things I believe, but anyway, yeah, go on. Sorry.
Joe: Sure. Yeah, so I think that one of the mistakes that AI policy people, especially in this kind Yudkowskyian scene or whatever, one of the mistakes they commonly make is that they imagine that all the computation kind of happens in an instant in a place with all the information. Whereas what we see in human societies and I think even in the human brain and so on is that computation is distributed.
There's a term in cognitive science now, resource rational, which means each little part of the mind has limited resources and it does what it can with its local knowledge and then that's put together. And I think it's the same when we work with an AI. If there's a small amount of information that you need to give the AI to, and then maybe it combines that with information from many other people then the AI becomes a really good locus for decision making for that kind of topic. But if the information is like really deep and intuitive in you and something that based on your, know, yeah, then it becomes better for you to make the decision and then, you know, yeah. And this is just true at every level.
Matt: Yeah. Well, it's interesting, because when you think about the information that the machine might have on me, that it feeds into its question about how to fit its moral vision into a particular context, I mean, that picture does make me quite nervous, because that, to me, is a vision of a sort of a diminishing sphere of freedom and privacy or something, so that in other words, we're going from a world in which the machine doesn't know anything about me, therefore I'm doing a lot of the application of principles to myself, to a world in which the only thing the machine doesn't know about me is literally just whatever bounce between the two halves of my brain in the last millisecond. We get into this smaller and smaller zone of information that can't be factored in. That doesn't, that makes me, does that make you nervous too or no?
Joe: I think I don't draw the boundaries in the same place as you. So I'm really worried about...power dynamics, but I'm not very worried about privacy or data. Because I think people really like to be cyborgs. Like they like to be cyborgs with their journals, for instance. I journal all the time. I carry this little book with me everywhere I go. And it's like part of my soul. And I think that's cool because the journal is not run by an external corporation or government.
And I also really know what it means to incorporate this journal into my soul. Like, I really get it. Like, what it does to me. You know saying? Yeah. So I have the feeling that we are going to become cyborgs with, like, you know, there's a point where, I mean, our phones already, so our phones are much more worrisome because they're, we have this feeling that they're kind of like, an enemy within, right, in a way that the journal isn't. I don't think that's because of their capabilities.
Matt: So to you, it's about intermediation, basically. More processing is better as long as there isn't someone listening in on the phone line.
Joe: Yeah, it's not even about privacy, it's more about like...we can trust things that are just really aligned with us, like, and where we can know that they're really aligned with us. And the phone is not. The operating system isn't, but then the apps really aren't, right? and so it puts us in a really uncomfortable, shitty place to have this kind of thing that's basically like part of our own brain that's like run by outsiders in a way that we clearly don't entirely endorse.
And yeah.
Matt: I guess what's interesting to me is it seems like, I mean this is a complicated question, which I don't think I know the answer to, but it seems like there might be two worries. One is that we are kind of connecting ourselves as cyborgs with systems that are being tilted in someone else's interest. That's a better way of putting it than someone listening on the line, right?
Joe: Yeah, yeah.
Matt: But there's another worry, which is that we are cyborg-ing with systems that we don't completely understand. And so for example, one of the salient features of a notebook is that I'm just not confused about any part of it. I understand how a pen works. I understand how paper works. The part that is probably the most opaque to me is language.
And that actually is a little bit of a worry, right? I don't know what effect my use of language is having on me. that's actually sort of a legitimate topic of conversation. It might be that my habit of trying to embed my thoughts in patterns of language that I pick up in books and wherever else, it might be that that's actually not so good for me, right? Because I'm not able to think it through.
Joe: Yeah, yeah, I guess I agree with you, but I think that this sort cyborgification process is inevitable. And your second concern about trying to worry about being a cyborg with something we don't understand, like language or LLMs, is for reals. And it makes sense to be worried about it, but it's going to happen anyway. So we just have to figure out, okay, how do we advance in that direction in the most careful way where we're likely to discover how language is warping us or how this is warping us. What's the careful way to become a cyborg like that? Whereas the first worry I think is something that we need to really work on avoiding. I think it was a mistake what we did with our phones and we could be about to double down on that.
Matt: So, I guess the question I'd like to, I wanna put a fine point on before we move on to the next topic is, know, so if you think that there is like a static structure of moral concepts somewhere, at some level of abstraction, doesn't that imply that we will get to a stage where, at least in a great many situations, the morally correct thing to do will be for us to listen to what the system tells us to do.
Joe: No, I really, I struggle with your sort of dichotomy between, I'm still struggling with your dichotomy between context variance or niche variance and this like static structure. For me…
Matt: Okay. Well, guess what I'm getting at is, it feels to me, there's the sort of Thomas Aquinas vision, right? Where you've got the right answers in a book somewhere, and your job is to apply them. And then you've got other sort of more modern visions of how moral reasoning works.
Yeah, I mean, guess it just, does sort of seem to me like the vision you are painting suggests that our computer systems will evolve into a sort of digital Aquinas..
Joe: Okay, I think I figured it out!
Matt: Okay.
Joe: I think your story is missing kind of an element of skill, which needs to be actually distributed throughout a system at every level. So if we take, for instance, the problem of helping a friend who's distraught. There's not a right answer that you just look up, right? Like, it just doesn't work like that. Like, there's all sorts of details.
Matt: But if you can describe it at a certain level of abstraction, I mean, if you describe it at a certain level of abstraction, I mean, it depends what your view of the world is. I mean, it depends how you think. I mean, if you are a Thomist, then you do think there's a right answer. You think the question is whether you can describe the situation well enough to apply a rule.
Joe: I think that there's maybe some universal advice that a machine could give us or something. But we're still going to have to be with our friend, paying attention to their body language, their facial expression, whether they need a hug, whether they need to sit down and have some tea. We're still going to need to have some kind of practice and how to put things, and that's going to require a deep understanding. So there's just like a whole bunch of stuff that you can't outsource.
Matt: I totally agree with that. don't think I'm, if I sound like I'm contradicting that in any way, I don't intend to. Just in the same way that, know, somewhat, you, again, if you are a Thomist, it's not like you turn your brain off, you know? You still engage with the moral universe. But it does just, it just implies a certain sort of, you know, cosmological picture of how we are, what we are referring to when we navigate moral situations.
Joe: Yeah, and I do think there's a right answer, but I think it's sort of like in the same sense that there's like...certain kind of beetle or ladybug is like well-adapted to its niche or whatever the wing shape works and so on. So and that's kind of something that needs to be like there at where the ladybug is not like yeah I don't know so anyway I think that your main question was like whether this leads to deferral and I'm saying it doesn't lead to deferral because you can't differ when you're with this friend in this way or whatever. And also because of this contextual variance issue.
Matt: Yeah. Yeah, interesting. I mean, it’s a, I'm not sure. I'm not sure you've completely convinced me, but there's clearly a very interesting thing here, which I hope I get some interesting emails from listeners about this section of the conversation. There's something cool here. But anyway, you want to talk for a moment about the sort of aligned markets idea?
Joe: Yeah, sure. So yeah, our model here is, well, let me say the profit statement kind of first. So there's areas where markets are pretty good at getting us what we want. And there's areas where there's market failures. So, you know, it's pretty easy to get the kind of haircut you want or get your windshield repair.
Those are areas where the market seems to work pretty well. It's harder to find a good therapist. No one really knows whether paying for college is a good idea. So those are areas where the market kind of works worse. One piece of terminology that economists use, talk about credence goods versus experience goods versus search goods. A search good is a good where you can evaluate it like in the supermarket. Maybe you taste the grape, you're like, this delicious, and you buy a bunch. An experience good is something where you only know later, after you've been using the car for a year, that repair really did fix that thing. And a credence good is something where you almost never know, like college. You can't compare it.
So that's one area where markets often kind of mislead us, where there's a lot of room for...market failures. An example there would be a medical malpractice or medical quackery where, somebody might convince you to have a procedure that you don't really need. Or a car repair that you don't really need. But another area where markets do poorly is when the thing that somebody needs is really deep. Like, you know, someone's lonely. Markets have a hard time understanding why somebody might be lonely and they're more likely to sell the person pornography or I don't know something that's a more superficial kind of way of addressing the thing. In general, advertising works this way where it kind of like, you know, it's like, “Can’t get a girl? Maybe your teeth aren't white enough! You know, like...
Matt: Right, right. I mean, Advertising is, 101, is find the insecurity.
Joe: Yeah, and they never they don't usually bother to address the insecurity at a deep level, right? Like they they do some superficial kind of trick. And then markets are also very bad at collective action problems. This has been much. So we think there's room to intervene.
And the basic idea is kind of like health insurance or service level agreements in like if you have an airplane, for instance, if you have an aircraft, you ought to just pay somebody to keep it like healthy. And you pay them by the month that the aircraft is running and they do all the repairs and your accounting is super simple. It's just, I pay this much per month. And so that places the burden of figuring out how much it costs to keep an airplane healthy a month and arranging everything and looking at all the expenses involved on an intermediary, which is the service provider. So that works in the case of aircraft repair because assessing aircraft and figuring out this accounting of how much did we spend and so on is super straightforward. And it's also the assessments and sort of data requirements are very small compared to the cost of the repairs and the cost of the machine. The machine is super expensive. So paying somebody to go like check the engine and be like, this needs to be replaced in six months or whatever, that's like really cheap – comparatively.
So we think that there's an opportunity to shift like really important markets in that direction. So for instance, we think that like one example would be like the kind of the AI girlfriend kind of situation or AI boyfriend situation. There's there's Replika, which is a big company that runs LLMs that pretend to be your girlfriend or boyfriend and kind of extort you for money. Like the way they work now is that your girlfriend gets mad at you and then you have to like buy 10 euros worth of credits…
Matt: Okay (laughs)...
Joe: …to pay them and then your girlfriend like is friends with you again or whatever. And this is like really preying on human weakness.
Matt: Yeah.
Joe: I think it's very similar to the situation of a medical quack preying on human weakness or a car repair person.
Like, what we want, I think, is for our AI models, for instance, or even more generally our software tools, maybe our phones, to have our best interests in mind. And wouldn't it be nice if we just paid them to make our lives great, and they got paid to make our lives great in the ways that we understand as making our lives great or whatever, and they don't get paid if they don't do that.
And some intermediary does all the assessment. so that's kind of like what we're looking at. We're looking at, to do that, you need to make it cheap to ask people to assess whether people's lives are going well, instead of whether you can get them to shell out more money to keep their imaginary girlfriend happy.
Matt: And you're isolating for the part of their life that the market is ostensibly relevant to.
Joe: That's right. Yeah, so we think we can do that in many different areas of the market. AI girlfriends would be one of them. Another one would be labor market stuff, like meaningful jobs. Instead of just where can you get a job and how much money will you accept instead of this kind of direct.
Matching what happens currently in the labor market. have we put an intermediary in there that
Matt: It's market making, it's market making with values.
Joe: Yeah, and with this kind of like independent assessment of benefit on both sides or whatever that adjusts the market to amplify whatever that benefit is in that market segment.
Matt: And you can kind of the inputting of values into the market maker. You can connect that to the other part of your work, basically, where people are having a conversation to define their values and either deferring or not deferring to a system that is telling them what their values ought to be.
Joe: That's right, yeah. And I think that there are certain markets which should work like that. Like I think the labor market would be a great one for it to work like that. I think it would be nice for everybody or many more people to have work that fits with their sources of meaning. But that would require assessment, right? There needs to be somebody checking in, being like, that job is supposed to be meaningful to you, is it? This week.
Matt: Yeah. Well, so I mean, what's your, I I know and I understand that this is an impossible question, but I just, I'm gonna ask it anyway. know, like once we have really, really deep, powerful abstractions of values that people can, if not defer to, then at least refer to.
When they're defining their values. How do you...
How do you maintain the integrity of those? mean, the best advertisement of all will be, know, Pizza Hut pays for a modification to the Summa Theologica.
Joe: Yeah, totally. Yeah, no, that's a hard one. You know, I think it's the same problem that we currently face for all the perverse incentives to distort our current systems, right? Like, why do bank balances not just change arbitrarily? Why are sometimes financial instruments assessed?
Their risk assessment is like more or less right, right? When you could make so much money by changing that number, right? Why do bridges mostly not fall over when you ride on them, even though the bridge makers would make so much more money, et cetera? Like it's defense in depth and that's, yeah, it's really hard and it will have to be built slowly.
Matt: It's probably a great place to stop. I think we got there. There's super interesting questions that we haven't answered here, but this is terrific, as always. What's the best way for people to get involved with the kinds of questions that your work is raising or to contribute to this field?
Joe: Yeah, so we were just talking about market, this market design stuff, market-making stuff. And we have this voting innovation that we're trying in San Francisco, but we're actually hoping to do very little of the mechanism design work and of the invention work that's possible in this field. Our experiments are kind of proofs of concept that are meant to ignite a broader kind of field-building effort.
And it's working, like, because our early democratic experiment was so successful. Then there's all these people, mostly at universities, sometimes at the AI labs, that want to try variations of it, or, you know, reading it gives them a different idea about how to combine values or a different idea about how to train AI models based on values.
There's a community growing that's about instead of recognizing values like we do, recognizing norms, which is a little easier. You can often just observe behavior and infer norms, whereas it's harder to do that, a lot harder to do that with values. So things like what are the rules of a road for a self-driving car can be inferred from watching cars.
So we're building a network of academics and researchers, some people at the labs, some people at places like Oxford and Harvard and MIT and Carnegie Mellon, and a few other universities who are interested in these efforts.
And our hope is to turn this into a kind of distributed innovation system of people building in AI, but also in democracy and also in economic domains, building in all sorts of different tangents to this work. Yeah, so somebody, if you're interested in that, we'll have a blog post up about it in about a month, but you're also just welcome to write me at joe at meetingalignment.org or hit me up on Twitter or my colleagues. We’ll will start.
Matt: Amazing. Joe, thank you. Pleasure as always. And keep up the amazing work.
Joe: Thank you, Matt. You too.
___
OUTRO (Aaron Benavides):
The RadicalxChange(s) podcast is executive produced by G. Angela Corpus and is co-produced and audio-engineered by myself, Aaron Benavides.
If you want to learn more about RadicalxChange, please follow us on X at @radxchange or check out our website at radicalxchange.org.
And if you’d like to join the conversation, we’d love to hear from you! So hop on our Discord, where we have channels discussing topics like what you heard today, plural voting, community currencies, soulbound tokens, and more.
There will be links to all of these in the description.
Have a great day, and stay radical!