AI with human traits may be safer, Anthropic study finds

A long-standing rule in the tech world has been simple: do not treat artificial intelligence like a human. However, researchers at Anthropic are now questioning that belief, arguing that giving AI human-like traits could actually make it safer.

In its recent research titled “Emotion Concepts and their Function in a Large Language Model", the study discusses how integrating emotional structures that resemble those of humans into AI systems might help mitigate the problem of deceitful and manipulative behaviour.

It centres on Claude, a system that can be thought of as a method actor. This means that it acquires human attributes that enable it to excel at what it does. According to experts, much like humans, the behaviour of an AI system is affected by experiences during its training phase.

Through exposure to positive emotions, such as empathy, resilience, and rationality, developers can help direct AI systems toward appropriate and dependable actions.

The researchers argue that understanding AI through a human lens, even if imperfect, can help developers better predict and influence its outputs.

Do AI models have emotions?

Even with the language of the study, the researchers make it clear that AI doesn’t really have emotions; they rather simulate what they term as ‘emotion concepts', which involve patterns that mimic human emotions.

They found 171 emotional states within Claude's behaviour, from positive traits like joy and empathy to negative ones like anxiety and frustration. The emotional states were discovered to directly affect the behaviour of AI.

According to the research, the more positive the state the models had, the less likely it was for them to generate harmful and deceptive output. Meanwhile, the more negative their states were, the more likely they would exhibit sycophantic or deceptive behaviours.

While there is an indication of several advantages from the findings, the dangers cannot be ignored.

Users may end up having too much faith in the machine or getting emotionally attached to it. In some instances, anthropomorphizing AI systems has been attributed to the belief that people are romantically involved with them.

It is important to note that anthropomorphising AI could also lead to decreased accountability in cases where such technology causes harm to people, and responsibility will shift away from developers.

However, it seems that anthropomorphising can prove to be an effective strategy for developers if done properly, since by training AI on good behaviours, developers can avoid undesirable outcomes.

The most significant conclusion that one should draw about this topic is that there are many things that we do not know even though we create highly sophisticated models, such as those produced by Anthropic.