Tag Archives: Mert Yuksekgonul

AI Detectors Suck

One reason why many AI experts are unconcerned about AI exterminated humans is that it doesn’t actually work very well.  The big risk seems to be that people believe in it way more than they really should.

Which makes the burgeoning field of “AI Detectors” even more dubious.  The idea seems to be, “if we are going to be flooded with AIbot generated spam, why not use AIbots to detect what is machine generated and what is human generated?” 

This “bot vs bot” battle is an adversarial game, a bit trickier than playing tic-tac-toe.   And the stakes can be enormous:  if a school assignment or job application is flagged as “fake”, or even, “suspect”, it can do real damage to the reputation and career of a real person.  The results better be right. 

Getting things right is not really the strong suit for today’s machine learning, which is known for “hallucinations” and just plain making things up.  It’s not cool to be thrown out of school on the basis of some garbage results from an AIbot.

False positives are bad enough, but this summer Stanford researchers report that machine learning “detectors” are biased in their errors [1].   Specifically, ChatGPT based scanners more often flag English text written by non-native speakers as “AI-generated”, compared to similar text from native speakers. 

This is not just unfair, it tends to privilege the privileged and call everyone else a “cheater”. Not cool.

My own suspicion is that this is a problem with the training sets.  What, exactly, is the right set of examples to train on?  If the machine learning is taught to recognize “good examples” of human writing, then it won’t know how to recognize all the rest of the goop generated by us mortal Carbon based units, which is not necessary all that great.  (Run that sentence by an AI, see what it thinks.) So the AI will learn to flag “not-good-examples”, and most of us are NGEs a lot of the time.

Anyway, whatever the detector is detecting, it isn’t “AI” versus “human”.   The research seems to suggest that the detectors are sensitive to the size of the vocabulary and the diversity of the linguistic patterns—indicators, perhaps, of fluency, but not ironclad markers of human vs AI.

Even more ironic, the same study showed that using ChatGPT to “improve” your text boosted the likelihood of being rated as “human generated”!  That right folks, AI Detectors are biased against actual unaided human generated text!

As Sensei Janelle Shane puts it, “Don’t use AI detectors for anything important “[2]  Sensei Janelle notes that these AI “detectors” have panned her own book, flagging her own deathless prose as suspected machine generated.  She also has shown that AI manipulated versions of her text—AKA, “cheating”—are more likely to be flagged as “human” than the human generated original. 

All I can say is, if the program can’t play tic-tac-toe as well as me, then I wouldn’t trust it to do much of anything.


  1. Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou, GPT detectors are biased against non-native English writers. arXiv  arXiv:2304.02819, 2023. https://arxiv.org/abs/2304.02819
  2. Janelle Shane, Don’t use AI detectors for anything important, in AI Weirdness: the strange side of machine learning, June 30, 2023. https://www.aiweirdness.com/dont-use-ai-detectors-for-anything-important/