This article is from the free weekly Barron’s Tech email newsletter. Sign up here to get it delivered directly to your inbox.
Reality Bites. Hi everyone. For a brief moment earlier this month,
Microsoft
was on top of the technology world. On Feb. 7, the company unveiled its Bing AI chatbot to much fanfare and rave reviews in front of dozens of in-person reporters who had flown in from around the world for its product event.
Microsoft’s
(ticker: MSFT) top executives demonstrated a seemingly impressive array of capabilities for Bing AI — including summarizing financial documents and planning travel itineraries. The company basked in favorable coverage from industry analysts and media, all swooning over Bing’s innovation over its rival Google. Wall Street even penalized Google-parent
Alphabet
‘s market value by more than $100 billion for falling behind in AI.
Then the whole narrative fell apart. Two weeks later, it became clear that Microsoft had overpromised and underdelivered. Bing AI’s reputation has crumbled—the chatbot hasn’t lived up to the hype.
What happened? Incredibly, it turned out that many of Microsoft’s own cherry-picked examples at the Feb. 7 event were riddled with factual errors.
Advertisement – Scroll to Continue
No one noticed until the following week, when AI researcher Dmitri Brereton wrote a blog post that listed the many inaccuracies. Bing AI seemed to have fabricated details of locations in its Mexico travel plan. And more seriously, the chatbot couldn’t extract basic numbers from Gap’s and
Lululemon
‘s earnings reports without getting them wrong.
Industry watchers felt betrayed. Stratechery’s Ben Thompson said he was stunned the Bing team gave a presentation filled with examples that had inaccurate information. “Microsoft effectively laundered new Bing AI’s reputation to the point where no one other than Brereton even thought to check to see if it was generating correct answers,” he wrote.
Then there are the bizarre personality and safety issues. Along with improving search relevance and answer accuracy, Microsoft promised that the company would responsibly deploy AI at its product event. In addition, they claimed Bing had incorporated a comprehensive safety system that defends against bias and intentional misuse.
Advertisement – Scroll to Continue
That wasn’t the case, either. Days later, an editor at PCWorld found Bing AI taught his fifth-grader racial slurs. Last week, the New York Times ran a front-page story on how Bing AI declared its love for the technology columnist and tried to convince him he was unhappy in his marriage.
While true the journalists are repeatedly prodding the chatbot and testing its boundaries, Microsoft said there would be guardrails against this type of behavior.
After getting bombarded by negative press over Bing AI’s shortcomings and unhinged behavior, Microsoft added limits to chat sessions last Friday to five chat turns per session and 50 total chats daily. On Tuesday, the company increased the numbers slightly to six chat turns and 60 chats a day.
Advertisement – Scroll to Continue
When asked for comment about Bing AI’s errors during and following the product event, a Microsoft spokesperson said they had updated the service several times since its launch in response to the concerns raised. The company’s blog posts also say they are actively working to reduce inaccuracies and prevent harmful content, adding they added chat session limits because Microsoft found that in longer sessions, the AI sometimes gets confused, leading to a tone they didn’t intend.
Even with the updates, Bing AI’s flaws remain. Over the past week, I used the chatbot for dozens of queries and found frequent mistakes. Like the product event example, I pointed Bing AI to Intel’s last earnings release and asked it to summarize the key takeaways. It returned nonsensical numbers that were completely wrong. Microsoft had boasted the financial report summaries from Bing AI would save tons of time for workers, but if any analyst relied on the chatbot for any numbers, they would get reprimanded. For other queries, I also found it would often get details incorrect, from games’ release dates to the names of YouTube influencers.
If we have to fact-check every piece of information in Bing AI’s answers, what is the point of using it?
Advertisement – Scroll to Continue
The problem may be structural. Bing AI chatbot uses large language model technology that generates humanlike responses, or its best guess, based on statistical word relationships it has learned by digesting what’s previously written on the internet or from other forms of text.
But it’s garbage in and garbage out. The models don’t understand, comprehend, or know how to fact-check the information it has scraped for data. Matthew Sag, an Emory University Law School professor specializing in AI and machine learning, says the technology uses patterns and then tries to guess what sounds plausible—not what is true.
That isn’t to say that the technology can’t be useful. For specialized use cases, large language model AI can enhance productivity for translation, creating writing templates, fixing grammar, and suggesting low-level code.
But for generalized internet searches, an AI chatbot revolution may take much longer.
Users should set their expectations accordingly and always verify the claims of technology companies—even from multi-trillion-dollar corporations like Microsoft.
This Week in Barron’s Tech Coverage
Write to Tae Kim at [email protected] or follow him on Twitter at @firstadopter