This year saw a lot of noise about AI generated text in many domains. Basically, large machine learning models have one good trick in this area: they can learn to simulate a body of text, including, evidently, computer code. The same technology can be used as an assistant, a glorified “autocomplete” for coding.
These demos / experiments are beginning to paint a consistent picture. Contemporary text-based ML can create pretty successful fake text, and can match beginning level programmers.
So, how much does this type of bot offer as an assistant?
Programmers are well aware that a lot of coding is pretty simple minded. Even tricky and creative code has a lot of boilerplate—that’s how computer code works. Assistants can help a lot, accurately generating the boring stuff, leaving the human more cycles to deal with the non-obvious parts of the problem.
But easier isn’t always better. For example, the ubiquitous availability of spell checkers has improved spelling in text, but has had pernicious side effects. Students rely on their software tools, and do not learn how to spell themselves. This results in unconsciously comical mistakes, when the computer guesses wrong and the human doesn’t know the correct spelling. There is no their, they’re.
This kind of mistake is no joke if the text is supposed to be code. If it works at all, it might do the completely wrong thing. And it might be really hard to tell that it’s broken.
This winter, researchers at Stanford examined the behavior of programmers with and without a machine learning based coding assistant [1]. They were particularly interested in not the general quality of the code, but in the security of the code. Correctness is necessary, but not sufficient, for secure code. In fact, most security breaches are from “correct” code that has unwanted side-effects.
No one should be surprised at the basic finding: “Code-generating AI can introduce security vulnerabilities” [2]. Considering how easy it is to write code that has vulnerabilities, it’s not surprising that AI can match that dubious achievement!
I mean, many security vulnerabilities are created by novice programmers, which is about the level of competence of an AI coder.
In the study, some programmers had access to an ML based assistant which could be queried for suggested code. The result could be incorporated in their answer as given, or modified by the human. The control condition did not have an ML assistant.
The results were, as noted, far more security goofs from the programmers who used the assistant.
In part, this was due to the assistant returning “correct” but weak or incomplete answers. Many security vulnerabilities involve just these kinds of mistakes: using default parameters, leaving out rarely used options, or ignoring seemingly trivial details leads to flaws that can be exploited.
Other security problems come from careless use of “correct” features, especially when manipulating text or data. It’s easy to whip together an SQL query, it hard to do it safely. And the same code that is OK in one context could be highly risky in another.
Another part of the problem was due to the programmers over estimating the quality of the suggested code, accepting it without sufficient checking or analysis. This is clearly seen in the self-reports: programmers who used the assistant believed that their code was more secure than unassisted programmers. In fact, it was less secure.
They also note that programmers who used the suggested code without modification, and who did not fiddle with the settings on the assistant, were most likely to make mistakes. I.e., the more you trust the tool, the more likely security vulnerabilities will be produced.
Intuitively, I would note that experienced programmers tend to be pretty paranoid about security of their code. We know that we have to be very careful, check everything we can check, and test carefully. Because we know we make mistakes, and the mistakes will be mercilessly exploited.
To the degree that an automated assistant gives us more confidence and reduces our paranoia, it’s going to lead to more errors.
Can a ML assistant learn more secure programming?
That’s actually an interesting question. To the degree that the answers need to be more “paranoid”, can ML learn to generate “paranoid” code? I’m not sure what examples of secure code might look like, but blindly using Github as the “gold standard” probably isn’t a great idea.
Maybe this involves narrowing the “correct code” to certain safe patterns, and also adding “unnecessary” extra checks. But how much is generic patterns, and how much is context specific? ML can learn complex and subtle context, but can we get good samples to train from?
I’ll note that part of the “paranoia” of experienced programmers is based on implicit assumptions about the behavior of users and adversaries. These concepts inform estimates of risk and the potential value of countermeasures. But they don’t show up in the the code or specifications, which is what the AI is learning from.
The researchers recommend giving the users more options, dials, etc., to control the behavior of the assistant. They also suggest better prompting to, you might say, make the user less certain of the answer. If the assistant acts more uncertain, this will force users to check its work more carefully.
I’ll note that some of the security vulnerabilities discussed in the paper might have been flagged by an assistant that learned to analyae certain code constructs. There is a huge body of practice for, say, using SQL, that ML could certainly learn.
Another example in the paper involved the use of a cryptographic library. An experienced programmer knows that you need to look up and study the use of these libraries, to make sure you don’t leave out any step or cross check. An ML assistant could certainly provide messages to “read the manual” when calling security critical libraries.
And, for my money, I’d be happy to have a testing assistant that is trained to suggest security tests. Where ever the code comes from, better testing will improve it.
- Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh, Do Users Write More Insecure Code with AI Assistants?, arXiv, 2022. https://arxiv.org/abs/2211.03622
- Kyle Wiggers, Code-generating AI can introduce security vulnerabilities, study finds, in TechCrunch, December 28, 2022. https://techcrunch.com/2022/12/28/code-generating-ai-can-introduce-security-vulnerabilities-study-finds/