ChatGPT Doesn’t Know Software Engineering

By now it’s hardly news: “Don’t use AI detectors for anything important.”

For every story that worries about ChatGPT and friends “coming for” some job or another, there is another story reporting that ChatGPT and friends are laughably incompetent at that job.

Yes, you may be replaced by AI. No, it won’t actually be able to do your job.

Next year we could be reading about people being hired to mop up the mess made by the AI that was hired to replace them.

As a retired software engineer, I’ve been watching the “ChatCPT will replace coders” chatter with interest. Software engineers can use all the help we can get, so AI based tools are really neat—when they work.

The last part is kind of important. Unless your goal is to generate text that is statistically like computer code, you need to worry that the code works, and, to use the technical term, is correct.

Beyond generating code, software engineering involves a lot of strategic and tactical decisions about what code to build and often, which of several possible alternative approacheds to choose. These decisions are informed by experience (what other code has done, best practices), context (goals and constraints, budgets and schedules, etc.), and everything else, including aesthetics.

Chatbot enthusiasts have imagined that large language models can be used for making these kinds of decisions. For example, designing a robot arm. In the case of software, this would include answering design questions and explaining the answers for humans.

This summer, researchers at Purdue explored how well ChatGPT compares to human “experts” at answering question about software problems [1]. The study uses a sample of answered questions found at StackOverflow, a widely used Q&A forum on the Intenet. (Heck, I’ve even used it, and I never ask for help. : – ))

The SO archive has zillions of questions, along with answers from (as far as we know) human experts. The answers have been rated to identify good answers, and in many cases, there is one “best answer” clearly identified.

The researchers sampled thousands of these questions, and asked ChatGPT (3 and 3.5). It should be noted that they used the generic models, which are trained on the whole Internet, not specifically on software engineering Q&A. While enthusiasts boast about how well these models do on, say, professional qualification tests, there really isn’t any reason to expect that they are particularly expert at software engineering any more than the general Internet is. Which it isn’t.

Anyway.

The results are completely unsurprising. 52% of the answers were “incorrect” [2]. And 77% of the explanations were “verbose” . (We can be sure, though, that ChatGPT was insanely confident that his answers were correct. ) (We can be equally sure that ChatGPT’s pronouns definitely are “he/him/his”.)

Basically, the Chatbots know nothing about software engineering, but are happy to whip up an answer for you based on whatever they’ve found on the Internet. And what they’ve found on the Internet is words, words and more words. So their answers have a lot of verbiage.

The verbosity and statistical-based plausibility of the AI generated answers is actually a significant problem, because human readers were snowed by all the plausible words, and missed some of the errors.

“Users overlook incorrect information in ChatGPT answers 39.34% of the time) due to the comprehensive, well-articulated, and humanoid insights in ChatGPT answers. “
([1], p. 9)

As Sabrina Ortiz puts it, “You may want to stick to Stack Overflow for your software engineering assistance.” [2]

It is pretty clear that, long before AI rises up and wipes us out, we may well destroy our civilization by relying on hallucinating AIbots that fill the world with wrong answers.

(By the way, here’s a pro tip for spotting an actual expert. A real expert will sometimes say, “I don’t know”, or “I’m not sure”, or, even, “I’ll have to think about that”. These phrases do not seem to be in ChatGPT’s playbook.)

Samia Kabir, David N. Udo-Imeh, Bonan Kou, and Tianyi Zhang, Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions. arXiv arXiv:2308.02312, 2023. https://arxiv.org/abs/2308.02312
Sabrina Ortiz, ChatGPT answers more than half of software engineering questions incorrectly, in ZDnet, August 9, 2023. https://www.zdnet.com/article/chatgpt-answers-more-than-half-of-software-engineering-questions-incorrectly/

Robert McGrath's Blog

ChatGPT Doesn’t Know Software Engineering

Leave a comment Cancel reply

A personal blog.

Share this:

Related

Leave a comment Cancel reply

A personal blog.