AI Gets Smarter, But Also More Likely to ‘BS’

It turns out that your friendly neighborhood AI assistant might be getting too confident for its own good. A new study reveals that as language models like OpenAI’s GPT and Meta’s LLaMA become more powerful, they’re also becoming…well, bigger fibbers. The research, published in Nature, shows that these beefed-up AIs are more likely to churn out inaccurate answers—even when they don’t have a clue. Why? Because they’re getting better at pretending they do.

The issue isn’t just limited to rare, brain-busting questions; even the simplest queries can trip them up. But because they can tackle tougher topics convincingly, we might be overlooking their obvious mistakes. The solution? Maybe these chatbots should learn to just say, “I don’t know.” But for companies keen to show off their high-tech toys, admitting ignorance isn’t exactly a selling point.

This thinking seems to hold true for big language models too, which keep getting stronger with each new version. Fresh research points out that these smarter AI chatbots are actually becoming less reliable because they tend to make up facts instead of dodging or refusing to answer questions they can’t handle.

Image: Pexles

In the Search for Smarter AI Chatbots, We’re Left With Increasingly Unreliable Ones

The study, published in the journal Nature, looked at some top commercial LLMs in the field: OpenAI’s GPT, Meta’s LLaMA, and an open-source model called BLOOM from the research group BigScience.

It found their answers are often more accurate now, but they’re less reliable overall giving more wrong answers than older models did.

“They try to answer pretty much everything these days. This means more right, but also more wrong [answers],” study co-author José Hernández-Orallo, who works at the Valencian Research Institute for Artificial Intelligence in Spain, told Nature.

Mike Hicks, who studies science and technology philosophy at the University of Glasgow, took a tougher stance.

“That looks to me like what we would call bullshitting,” Hicks, who didn’t take part in the study, told Nature. “It’s getting better at acting like it knows stuff.”

The researchers tested the models on subjects from math to geography, and also asked them to do tasks like putting information in a specific order. The larger more capable models gave the most correct answers overall but struggled with tougher questions where they were less accurate.

The study found that OpenAI’s GPT-4 and o1 were some of the biggest bullshitters answering every question thrown their way. This trend seems to be affecting all the LLMs examined. For the LLaMA family of models, none could score above 60 percent accuracy even on the simplest questions, according to the research.

In a nutshell, as AI models grew larger—considering parameters, training data, and other elements—they gave a higher percentage of incorrect answers.

AI models are getting better at answering harder questions. The issue, besides their tendency to make things up, is that they still get the simple ones wrong. In theory, these mistakes should raise more red flags, but we might overlook their clear flaws because we’re amazed at how these large language models handle complex problems, according to the researchers.

The study had some worrying findings about how people view AI responses. When asked to determine if the chatbots’ answers were right or wrong, a chosen group of participants made mistakes 10 to 40 percent of the time.

The easiest way to fix these problems, the researchers say, is to program the LLMs to be less keen on answering everything.

“You can set a limit, and when the question is tough, [make the chatbot] say, ‘no, I don’t know,'” Hernández-Orallo told Nature.

But being truthful might not help AI companies trying to impress people with their cool technology. If these smarter AI chatbots were limited to answering things they knew about, it could show the boundaries of the tech.

[mashshare]

[ssba]

tagged in Ai assistants, AI reliablity, LLaMA models, smarter AI

AI Gets Smarter, But Also More Likely to ‘BS’—What Gives?

In the Search for Smarter AI Chatbots, We’re Left With Increasingly Unreliable Ones

FAQs

Leave a Reply Cancel reply

7 Luxury Smartwatches for Men: Top High‑End Designer Picks

Fluffy AI Robot Moflin Debuts in U.S. as Casio’s Cutest Tech Yet

Best Smart Ring Showdown: Oura Ring 4 vs Circular Ring 2

Amazfit Bip 6 Review: Features, Battery Life & Is It Worth Buying?

Smart Helmet: From Protection to Prevention

Best ECG Smartwatches 2025: FDA-Approved

The Rise of Smart Jewellery: Redefining Fashion with Smart Tech

Top Kids GPS Tracker 2025: Smart, Safe & Parent-Approved

7 Best Wearables for Seniors: Smart Safety You Can Trust

Apple Watch Won’t Turn On? Easy Fixes That Work

5 Kid-friendly Apps like TikTok : Apps Like Tiktok

How to fix Apple AirDrop when it stops working? Airdrop Waiting

Solve Math Problems by Scanning Notebook with PhotoMath Math Scanner

US Military will test an Anti-Aging Pill next year to improve soldiers’ performance

Anime Dating Apps & Sites for Anime Singles!

Book Tracking Apps To Keep Track of Books You’ve Read

Smallest Microchip in Existence: Microchip over the Years

Fortnite Celebrates Birds of Prey With A Harley Quinn Skin

7 Top-rated Car Buying Apps of 2020

Can Instagram Be Your Therapist?