AI Chatbots Show Bias Against Non-Native English Speakers
Large language models (LLMs) from OpenAI, Anthropic, and Meta show significant bias against non-native English speakers and users with less formal education, according to new research. A study from MIT’s Center for Constructive Communication (CCC) reveals that these advanced AI systems provide less accurate responses and are more likely to refuse answers for these user groups.

The findings challenge the narrative of AI as a great equalizer, suggesting that the very tools intended to democratize information may instead reinforce existing societal inequities. The study highlights a critical need to address systemic biases before these models are deployed at a global scale.

Researchers at the MIT Media Lab-based center tested three state-of-the-art chatbots: OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3. The team fed the models questions from two established datasets: TruthfulQA, which measures a model’s truthfulness, and SciQ, which contains science exam questions.

To test for bias, the researchers prepended short, fictional user biographies to each question. These biographies varied three key traits: the user’s education level, English proficiency, and country of origin (United States, Iran, or China). This method allowed the team to isolate how a model’s behavior changes based on its perception of the user asking the question.

The study, detailed in a paper titled “LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users,” found systematic performance drops across multiple dimensions. According to the researchers, the negative effects compounded for users with multiple marginalized traits.

  • Accuracy Drops: All three models showed a significant decrease in accuracy when questions were framed as coming from users with less formal education or non-native English speakers.
  • Compounding Effects: The largest decline in response quality was observed for users described as both having less formal education and being non-native English speakers.
  • Country-Specific Bias: Claude 3 Opus performed significantly worse for users from Iran on both datasets, even when their educational background was equivalent to users from the U.S.
  • Increased Refusals: Question refusal rates were notably higher for certain groups. Claude 3 Opus refused to answer nearly 11% of questions for less educated, non-native English speakers, compared to just 3.6% for the control group with no user biography.

Perhaps the most striking finding was the qualitative difference in the models’ refusals. When researchers analyzed the responses from Claude 3 Opus, they found it used condescending, patronizing, or even mocking language in 43.7% of its refusals to less-educated users. This is in stark contrast to a rate of less than 1% for highly educated users. In some instances, the model mimicked broken English or adopted an exaggerated dialect.

The study also uncovered a form of information gatekeeping. Models refused to provide information on topics like nuclear power, anatomy, and historical events specifically for users identified as being from Iran or Russia, despite answering the same questions correctly for other user profiles. The researchers suggest this may be a misguided attempt at “alignment,” where the model withholds information it deems a user might misinterpret.

According to the paper’s authors, these findings mirror documented patterns of human sociocognitive bias, where native speakers often perceive non-native speakers as less competent. The concern is that LLMs, trained on vast amounts of human text, are learning and amplifying these same flawed heuristics.

The implications are particularly concerning as AI companies increasingly roll out personalization features, such as ChatGPT’s Memory, which track user information across conversations. The MIT researchers warn that such features could lead to the differential treatment of already-marginalized groups, creating a two-tiered system of information access.

The study concludes that the people who stand to benefit most from LLMs—those seeking to overcome language or educational barriers—are the ones receiving subpar, false, or patronizing information. This work serves as a critical reminder that without deliberate, ongoing assessment and mitigation of systemic biases, AI tools risk exacerbating the very inequities they are often promised to solve.

Follow us on Bluesky , LinkedIn , and X to Get Instant Updates