Andreas Liesenfeld | Robert McGrath's Blog

ChatGPT and friends have generated a lot of hype this year—despite and because of how poorly they work.

It’s reasonable to ask, “How do they come up with their answers, right or wrong?”

It’s actually hard to answer that question because these ML models are totally opaque.

Yes, even the ones that say they are “open” aren’t open. (Noone is surprised that “OpenAI” isn’t “open” at all. In 2019, they changed from ‘non-profit’ to ‘mercilessly mercenary’, but decided to keep the fluffy, comfortable sounding organizational name.)

This summer, researchers in The Netherlands evaluated the openness of contemporary large language models, most of which claim to be “open source” [1]. They found that none of them are open enough for third parties to evaluate them or replicate their results. Since publication, they have extended their results to 21 models [2].

“We find that while there is a fast-growing list of projects billing themselves as ‘open source’, many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare.”
([1], p.1)

Let’s review.

These models work in mysterious ways. They produce a lot of wrong results, along with preposterously over confident self-assessments. They are trained on unknown and undocumented data sets (the use of which data has unknown legal standing). They rely on undocumented human tuning. Each new version may give different results.

These critters are opaque, unreliable, inaccurate, undocumented and have never been peer reviewed.

Does this sound like something that you would want to base your business on?

Does this sound like something that should even be legal to sell?

My own view—not that anyone asked—is that these companies should submit their technology to peer review. And if they claim to be “open source”, then they should open their source.

Otherwise, we shouldn’t take them seriously. And definitely shouldn’t give them any money.

Andreas Liesenfeld, Alianda Lopez, and Mark Dingemanse, Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators, in Proceedings of the 5th International Conference on Conversational User Interfaces. 2023, Association for Computing Machinery: Eindhoven, Netherlands. p. Article 47. https://doi.org/10.1145/3571884.3604316
Michael Nolan, Llama and ChatGPT Are Not Open-Source, in IEEE Spectrum – Artificial Intelligence, July 27, 2023. https://spectrum.ieee.org/open-source-llm-not-open

Robert McGrath's Blog

Tag Archives: Andreas Liesenfeld

AIbots are Opaque

A personal blog.

Share this:

A personal blog.