Yet More ChatGPT Doing Software Engineering

ChatGPT and similar ML bots have made a splash in many domains, producing plausible BS where and when asked. I’ve been particularly interested in how this bot does at computer programming. Obviously, this will not be “The End of Programming”, though it may raise the quality of script-kiddie cut and paste.

Strikingly, ChatGPT might also improve software documentation. I think documentation is an area where plausible BS is not only better than nothing, it may be better than current practice! : – (

Overall, ChatGPT and friends are mediocre at generating code from specifications. I guess it’s amazing they work as well as they do.

But what about the stuff that really soaks up human programmers’ time—debugging?

This winter researchers at UCL and Mainz report a study that asks ChatGPT to find and repair bugs [2].

The task is based on a set of benchmark problems. The bot is presented with a short piece of code and asked to say if it contains a bug, and if so, how to fix it. Again—it’s amazing that this works at all! The performance of ChatGPT was compared to some other automated debuggers based on deep learning.

All of the ML based bots did fairly well, getting about half of the cases. As in the case of code generation, this is probably at level of a novice programmer. ML based problem solvers presumably will improve with practice, of course.

The most interesting thing, though, is one condition where ChatGPT operated via a dialog with a human. The follow up questions and answers allowed the bot to ask for more information (e.g., what is the expected behavior of the code snippet) and to receive relevant hints. This dialog boosted performance over 75%–entering “B” territory.

This is kind of neat.

Combined with an ability to plausibly explain software, we’re starting to get somewhere here. A tool that can tell you what the program is supposed to do (e.g., from specs and other info) and can identify potential bugs and fixes sure sounds handy to me.

Is this the “end of programming”? Of course not. There’s more to programming than messing with code. Such as understanding the real requirements, and how the code fits into the real world.

The worst problem with this plausible bug fixer is that whatever the bot does has to be checked. It may be harder and / or more effort to understand what the bot did, than to just fix the bug in the first place. : – (

OK, to be fair humans need to be checked, too. But the bot ranks as “a novice programmer we have never met”, so we really will need to check it’s work until we gain confidence. Sigh.

“Despite its great performance, the question arises whether the mental cost required to verify ChatGPT answers outweighs the advantages that ChatGPT brings.”
([2], p. 8)

And, I can see it now. When a big problem is discovered later, fingers pointing in all directions.

Human programmers repeating and overriding all the work done by bots because they don’t trust them.

Ignorant managers firing humans because they do trust the bots, and don’t want to pay for everything to be done twice. And so on.

Definitely more efficient than unassisted human programming!

Emily Dreibelbis, Watch Out, Software Engineers: ChatGPT Is Now Finding, Fixing Bugs in Code, in PCMag, January 27, 2023. https://www.pcmag.com/news/watch-out-software-engineers-chatgpt-is-now-finding-fixing-bugs-in-code
Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke, An Analysis of the Automatic Bug Fixing Performance of ChatGPT. arXiv, 2023. https://arxiv.org/abs/2301.08653

A personal blog.

Robert McGrath's Blog

Yet More ChatGPT Doing Software Engineering

3 thoughts on “Yet More ChatGPT Doing Software Engineering”

Leave a comment Cancel reply

A personal blog.

Share this:

Related

3 thoughts on “Yet More ChatGPT Doing Software Engineering”

Leave a comment Cancel reply

A personal blog.