OpenAI's plan to make AI more accurate

ALSO: 'Restored footage' of the Roman Empire

Read time: under 4 minutes

Welcome back, Superhuman

On average, for every 100 questions you ask an AI model, it will fabricate at least three answers. OpenAI has come up with an idiosyncratic way to solve that problem: Using AI to catch its own model’s mistakes.

Today’s Insights

  • OpenAI’s plan to solve AI hallucinations

  • Tutorial: How to translate videos with accurate lip-syncing

  • 5 new AI tools to boost your productivity

  • Everything else you should know today

  • AI-Generated Images: The Capybara Blues


OpenAI’s approach to spotting hallucinations: Use AI

Source: AP

You know what they say: “You’re your own worst critic.” OpenAI took the phrase literally and built a model called CriticGPT that tries to find flaws in GPT-4’s responses. The strange part: The new model is powered by none other than GPT-4 itself.

How can an AI model catch its own mistakes? Because CriticGPT was specially trained to become an expert lie detector. Human researchers fed the model false information, then showed it how to respond with detailed critiques. For now, OpenAI is only using CriticGPT to assess GPT-4’s coding abilities, since the answers are cut-and-dry. More open-ended questions, meanwhile, can generate subjective responses that are harder to judge as strictly “right” or “wrong.” 

Wouldn’t humans be better at flagging errors? Most AI companies, including OpenAI, still rely on humans to assess models’ responses. An LLM will generate several responses to the same question, then a human will pick which one is most relevant and accurate — which, in turn, helps refine the model’s future answers. But as LLMs get more sophisticated, it’s getting harder for testers to keep up. Besides, it’s inevitable that humans will sometimes introduce their own mistakes and biases into assessments. 

So, how did CriticGPT compare? It managed to spot 85% of coding bugs, while trained humans only found 25% of them. In the end though, the best option turned out to be pairing humans with CriticAI. When humans and the model worked together, they performed 60% better than humans alone.

OpenAI isn’t the only organization working on this: Researchers from the University of Oxford just unveiled an algorithm that they say can spot AI hallucinations 79% of the time. That’s about 10% better than today’s best methods. There’s still work to be done, though, since the approach also uses about 10 times more energy than a typical chatbot interaction.


Don’t lose the AI race, hire AE Studio

Trusted by leading startups and Fortune 500 companies

AE Studio is the key to industry dominance, securing AI talent from Harvard, Stanford, and MIT to streamline operations.

What AE Studio will do for you:

  • Turbocharge your business, save hundreds of hours, and WIN!

  • Have AE build custom your custom software and AI solutions

  • Hire AE to pinpoint where and why you should be building NOW!

Stop waiting and start winning. Schedule your FREE strategy session to get your business ahead of the AI race. Get in touch here


How to translate videos with accurate lip-syncing

  • Go to ElevenLabs and sign up

  • Select dubbing from the left tab and enter all the relevant details for the video you want to translate

  • Upload your video or insert a video link in the ‘Select a Source‘ tab

  • Select source language, target language, and the number of speakers

  • Press Create and wait for the video to be processed

  • Voila, you’re done! You can download your translated video if you wish.

For the example above, I used an English-language video I found on YouTube and converted it into French.


Literary Critic

Prompt: I want you to act as a language literary critic. I will provide you with some excerpts from literature work. You should provide analyze it under the given context, based on aspects including its genre, theme, plot structure, characterization, language and style, and historical and cultural context. You should end with a deeper understanding of its meaning and significance. My first request is "To be or not to be, that is the question."

You can adapt the prompt to your specific needs.

Source: @lemorage on Github


If you don’t have an assistant, you are the assistant

Hire elite offshore talent at a fraction of US salaries, so you and your team can focus on the work that really matters. We evaluate 1,000+ candidates every month to help you find the best:

  • Years of experience across multiple job functions

  • College-educated and fluent in English

  • Vetted by our team of expert recruiters

What’s more? If you don’t like your hire in the first 6 months, we’ll find you a replacement. Find your next best hire today


Everything else you need to know today

Source: Character AI

  • Pocket Pals: Character AI will now let users call AI avatars on the phone. The AI characters can help you prep for an interview and learn a new language or can act as a role-playing companion.

  • Stacking the Deck: Rain AI — a startup that’s building more efficient AI semiconductors — has recruited former Apple hardware expert Jean-Didier Allegrucci, marking its second big-name hire in a month.

  • Bot or Not: Meta will begin rolling out a new Instagram feature that lets creators build chatbots modeled after themselves, although the avatars will be clearly labeled as AI.

  • Copycat Catastrophe: Amazon Web Services has launched an investigation into its cloud partner Perplexity after reports emerged that the AI-powered search engine was plagiarizing material from across the web.

😄 One Fun Thing: An AI-generated video depicting “restored footage” of the Roman Empire has generated nearly 6,000 upvotes on Reddit. The creator came up with the 48-second clip by first generating images of the Roman Empire in Midjourney, then feeding them into Luma AI’s Dream Machine.


5 AI Tools to Supercharge Your Productivity

 ElevenLabs Reader App: Choose a voice from an extensive library, upload any type of text content, and listen on the go.

 Question Base: An AI-powered autoresponder for Slack. It answers the most repetitive questions, so you don’t have to.

 Jellypod: Convert your emails and newsletters into a personal podcast.

 AITerm: An AI assistant that helps developers and command-line users directly within their terminal via natural language.

 Scoopika: An open-source developer platform to build personalized AI agents that can see, talk, listen, and more.

PS: Want more? Check out our Top 100 AI Tools.

* indicates a promoted tool, if any


Capybara Blues

Source: @unicorn0908 on Midjourney

Prompt: A realistic photo, capybara playing [insert musical instrument, like saxophone] in a [insert location here, like pub], the capybara's expression one of pure enjoyment, the saxophone's golden tones shining brightly, the pub's warm, ambient lighting casting a glow on the scene, filled with vintage decor and a cheering crowd, creating an atmosphere of joy and entertainment, Photography, shot with a Nikon D850 and a 35mm f/1.8 lens --ar 16:9

Acquire new customers and drive revenue by partnering with us

Superhuman is the world’s biggest AI newsletter for businesses and professionals with 600,000+ readers working at the world’s leading startups and enterprises. Companies like Amazon, Hubspot, and Salesforce feature their products in Superhuman. You can learn more about partnering with us here.  

🧞 Your wish is my command 

What did you think of today's email?

Your feedback helps me create better emails for you!

Login or Subscribe to participate in polls.

Reviews of the day

Thanks for reading.

Until next time!

Zain & the Superhuman AI team

p.s. If you liked this newsletter, share it with your friends and colleagues here.