AI reveals when its mental health advice may be wrong
AI confessions provide users transparency and accountability to understand reasoning
Generative AI and large language models (LLMs) are increasingly used to provide mental health advice, but experts warn that some guidance can be misleading or harmful.
Researchers and developers are now experimenting with AI confessions, secondary responses that reveal how the AI generated its answers, to make the technology more transparent and safer for users worldwide.
How AI confessions work?
In a research paper published by OpenAI titled as Training LLMs for Honesty via Confessions by Manas Joglekar, Jeremy Chen, Gabriel Wu, Jason Yosinski, Jasmine Wang, Boaz Barak, Amelia Glaese observes that a confess task for LLMs can uncover the errors, exaggerations, and simplifications contained within the output.
For example, a historical response may initially distribute a certain number of facts before the AI confesses that it cannot assure the validity of each point made within the output. This provides a glimpse into the workings of the software, which can then be improved to ensure safety.
Risk increases as the AI suggests mental health-related guidance. In one instance, the user confides that they are feeling sad and tired. Supporting statements are generated, but the confession follows, stating that the response relies on general patterns and will not be personalised.
In another, the AI acknowledges its oversimplifying thoughts meant to invade one’s mind and that sometimes the thought of reassurance will not be helpful.
These confessions will sometimes warn the user of the need for care, but they can also plant seeds of doubt.
Confessions of AI may help with transparency, allowing both the user and the developer to understand the reasoning the model uses.
However, there is a potential problem here: Confessions of AI might end up causing harm to the most vulnerable sectors of the population if they are alarming or misleading. It is suggested that these confessions should be optional.
-
UK launches £40M frontier AI lab to strengthen tech independence
-
TikTok says no to end-to-end encryption, citing user safety
-
Microsoft plans to license AI agents like employees
-
OpenAI eyes NATO contract for coalition networks following Pentagon deal, source says
-
Sam Altman responds to criticism over OpenAI’s Pentagon AI contract
-
Are you quitting ChatGPT? Here’s what to do before you switch
-
Social media ban: Virginia appeals injunction against social media screen time limit for children
-
Alibaba’s Qwen AI division head resigns shortly after new product launch
