Skip to content


OpenAI has announced a new safety approach to make AI models safer

OpenAI’s new safety approach

OpenAI has announced a new safety approach to make AI models safer

https://www.aitoolreport.com/articles/openais-new-safety-approach

Via AI Tool Report

“Our Report: OpenAI has developed a new way to improve the safety of its AI models—called Rule-Based Rewards (RBRs)—that uses AI to align model behavior with specific safety standards and policies, without human intervention.
🔑 Key Points:
  • Previously, humans would score the AI model’s responses to prompts based on how accurate they were or which they preferred: a method that was costly, time-consuming, and vulnerable to subjectivity.
  • With RBRs, safety teams can create rules for the model and AI will score its responses based on how closely they align with these rules, which is more efficient and non-subjective.
  • During testing, RBR-trained AI models showed improved adherence to safety standards and reduced instances of incorrectly refusing to answer a prompt, compared to those trained using human-led feedback.
🤔 Why you should care: While RBRs are a step forward in making sure AI models remain aligned with desired safety protocols—therefore creating safer models—OpenAI has acknowledged that while RBRs could reduce training time, cost, human oversight, and subjectivity, using AI to guide its models could potentially increase bias, so safety teams must design RBRs carefully “to ensure fairness and accuracy” and consider using them in conjunction with the traditional human-based feedback approach.”

 

  • Pro plugin deactivated or invalid

Posted on: July 25, 2024, 7:20 am Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.