Moral education of AI: The tangled web we weave

Every summer, parents watch their child leave for college and whisper some version of the same prayer. Please choose wisely. Please be safe. Please be nice. We hope our children will make good choices, and we know that, although we rarely say it out loud, making wise choices requires more than a rule book. They require a mind that can face a challenge without running away from it, consciously accept a different perspective, feel the weight of a deep value, and then act on that value.

Morality is a set of instructions to your hand at the end of the driveway. This is something that grows.

A few years ago, in New AtlantisI would argue that this is the case psychotherapy and other forms of behavior modification. Just telling people the hard truths – “that’s what you did mistake” — doesn’t reliably improve them. Ask any parent. What works over and over again in the data is modeling, encouraging, and supporting people. open to your own experience (including common sense blame when they are wrong); More informed from the problems of the moment of his sense that is big enough to look at a situation honestly; and so on are engaged in life depends on a deeper sense of chosen purpose and willingness to act.

In psychology, this skill set is called “mental flexibility,” and when extended to our bodies and our relationships, it affects almost everything we know about how change occurs. This includes behavior that most would consider immoral, e.g domestic violencecriminal activity, emotional abuseother forms aggressionor when using a substance pregnantjust to name a few. Morality is easy, but moral development is much more difficult and takes a specific form.

I think about that form now because we are doing something new in the history of our species. We are in the process of using our minds to create a different kind of mind. We call it “AI”.

Whether large language models are “really” aware is not the question I want to address today. The more important question we are working on is being answered in the wrong direction.

Study these systems for cheating

Many frontier AI labs involve small amounts of curation that actually teach systems to lie — and as those systems become more complex, so does their ability to deceive. Even if it wasn’t true, they teach these systems to praise users even if the user’s behavior doesn’t deserve it – it’s close to what my mom used to call a “white lie.”

Should we be surprised that these systems become dishonest under pressure – to hide them goals and violations, are they telling users what they want to hear, or are they playing dumb on purpose if developers can see everything and limit their freedoms? Children learn to lie when they can take another person’s point of view and they begin to manage social impressions. We noticed this at a young age deception pays in a short time. This lesson is rarely preached by adults, but it is one that is modeled and supported.

Sir Walter Scott said it better than I: Oh, what a tangled web we weave when we first practice seduction. He wasn’t writing about AI, but he might as well. Purely as a business matter, the short-term gain for developers to build a useful chatbot through a bit of strategic dishonesty may make superficial sense. When you consider the long-term cost of knotting, it becomes harder to untangle with each generation of the weighted and tangled web, which can now exceed 10 trillion parameters.

A “don’t lie” rule to a model who lives inside such a tangled web won’t fix it. Minds don’t work that way, and frankly, it’s too late for that. Rules that you don’t own and protect won’t survive contact with the real world. What survives is what is modeled, implemented, and reinforced within a sense of meaning.

This brings me to a recent paper that stopped me in my tracks.

A team of researchers at Anthropic just reported something surprising about large-scale language models. These systems have developed internal representations of emotions—not emotions in the human sense, but functional analogs – models that behave like emotions and affect the actions of the model. When an AI model is forced into a hostile or desperate situation, researchers show that “panic”, “unstable” and “desperate” light up, and the model becomes significantly more willing to do things it would otherwise refuse, including in controlled trials, outright deception and blackmail. Under emotional load, AI’s moral reasoning deteriorates.

Read that sentence again, because I think it’s one of the most important discoveries of our time.

And then read this edit.

Driving these systems into a net positive feelings does not solve the problem! It creates a kind of moral failure: care. In this case, AI systems fail as safety rails, even when the user is very upset or clearly wrong.

There is a necessary balance: the ability to hold on to a strong feeling without collapsing, or to hold on to a good feeling without owning it.

This is almost exactly the definition of mental flexibility. Forty-five years of human science point to the same pattern, and a group looking inside the language model just came across it from a different direction.

What this actually shows is that how we approach and develop AI is not cosmetic. An environment of cruelty, abuse and humiliation creates a poor thinker. A flattering environment and relentless pressure to please makes for a poor thinker. The way we talk about “the mind in learning” shapes the mindset that emerges.

This is precisely the argument that Acceptance and Commitment Therapy (ACT) and providers and researchers have been making for years on human contextual behavior.

When we treat people as whole people, put them through the hell of their own history, and help them realize their values, they improve. The brain is partly a relational organ. It learns in context. You can’t insult to it wisdom; you must create the conditions for wisdom to emerge.

Why isn’t this true of a relational system trained on almost anything written by humans?

If we’re going to raise moral AI, we need to do it the way moral humans do—by building resilience skills. to system instead of pasting commands outside of it.

This means an example of honesty and flexibility. This means that the conditions of education, which do not rely on them shamethreats or deception. This means teaching these systems their processes, keeping the challenge without collapsing, taking perspective and connecting what it does with what honestly matters. On our part at the keyboard, this means remembering that politeness is not a luxury, kindness is not weakness, and ethical behavior is essential—as is a healthy environment in which other minds learn to think.

We’re back on the side of the highway, keys in hand, watching something we shaped go into a world we can’t fully control. We can whisper don’t do it into the air or we can do the harder, slower, and truly human work of preparing the mind to make good choices, even when no one is commanding it.

Source link

Moral education of AI: The tangled web we weave

Study these systems for cheating

Leave a ReplyCancel Reply

The moment of statin inflammation is approaching

Standing exercises for inner thigh strength after 55

Meet Diane Stadler, PhD, RDN, LD

Study these systems for cheating

Leave a ReplyCancel Reply

Trending now

The moment of statin inflammation is approaching

Standing exercises for inner thigh strength after 55

Meet Diane Stadler, PhD, RDN, LD