Is Your AI Chatbot Hallucinating on You? What a Tool!

Are you getting strange search results? Switching to a different tool can make all the difference.

Knowing my enthusiasm for AI, an attorney friend emailed me a link to a May 2025 New York Times article reporting on the tendency of a new artificial intelligence tool to fabricate—to simply make things up.

Experts found that this tool, a state-of-the-art prompting technique called chain-of-thought reasoning, had hallucination rates for some tasks as high as 79 percent (but improved accuracy on other tasks). Also, in the early days of chatbots, ChatGPT (ChatGPT.com) made national news when an attorney reportedly submitted a legal brief that contained six citations to documents that didn’t exist.

My friend wrote, “It’s disappointing that I can’t place much reliance on AI for academic research due to the significant risk it will provide wrong answers—and that even the sources it cites may not actually be the correct citations. And I don’t plan to let it do my writing. So it’s currently of limited use to me.”

As your tech enthusiast for the past 30 years, I am undaunted. I’m here to tell you that it’s all just a misunderstanding. Seriously.

First of all, it’s true that the large language models (LLMs) at the heart of chatbots are not themselves able to establish veracity. They simply hoover up a gazillion words from the internet, train themselves to identify patterns by using “neural network” software, and then output plausible responses when prompted with a query. When ChatGPT first arrived in November of 2022, that’s all it could do.

In addition, since it takes months for a large language model to hoover up the words on the internet and learn the patterns, it doesn’t typically have up-to-date information.

But then the tech overlords had the idea to also give LLM chatbots real-time access to the internet. That way they could enhance the programmatic output of LLMs with specific information found online—and even give you links to the sources. This is referred to as “retrieval augmented generation.”

All the chatbots now have this and typically use it in response to your queries. ChatGPT, however, occasionally doesn’t recognize that a particular prompt would benefit from internet access and might give an outdated or otherwise problematic response. I then ask it to reply again using the internet. Or I sometimes simply choose the “Search the Web” option when giving my prompt.

Claude AI (Claude.ai) was the last major chatbot to integrate internet access, adding that feature in April.

Retrieval-augmented generation makes it less likely that a chatbot would produce a legal document with fabricated sources.

Also, my attorney friend could feel more confident using a chatbot such as Perplexity (Perplexity.ai), which includes citations with every answer and may be better than ChatGPT for academic-type queries where source traceability is key.

When I asked ChatGPT about this issue, it also suggested Semantic Scholar (SemanticScholar.org), an AI-enhanced academic search engine focused on science and policy papers.

In addition, the newer AI models with “deep research” capability, which I have previously written about, typically do a better job of citing sources that one can then check. They can link to current articles, legal opinions, and journal papers. Rather than doing a quickie search, they do multiple iterations, continually probing deeper to give a well-researched, comprehensive report.

As we’ve discussed, Perplexity gives five free deep research queries per day, while Google’s Gemini (Gemini.Google .com) and ChatGPT give a limited number of free queries per month.

Experts aren’t quite sure why the new chain-of-thought reasoning AI tool is having issues with hallucinating, but it’s not likely that my attorney friend would have reason to use it. This tool is for users who need step-by-step problem solving in disciplines such as math, logic puzzles, and computer programming.

There continues to be a plethora of new AI tools to choose from, depending on your needs. ChatGPT, for example, now has something like 10 different options for ChatGPT Plus users.

I think you get the picture. AI has evolved, different options have different strengths, and choosing the right tool helps you get what you want.

Someone recently asked me which AI he might use to get information that could be helpful to a friend who has long-term nerve damage from a shingles infection. Since that’s a straightforward issue that’s been around a long time, it seemed like a simple ChatGPT search would meet his needs.

AI is a tool, and like any tool, it’s good at some things and less capable at others. The key is to become familiar with it so  you can learn how best to use it. You begin to get a sense for when it might be hallucinating and needs to be double checked. And a sense for which tools are more apt to give you reliable, useful information.

Perhaps my lawyer friend will give AI a try and find it has some use. A deep research query on Perplexity would be a good place to start.