Speaking of AI Doing Math

Keeping with the unplanned sort of theme for the week, lets look at some more AI Doing Math.

It seems that our Silicon overlords seem to share at least one cognitive trait with we puny Carbon based units:  they find math hard.  Specifically, computers find some math easy, and some math hard.  Just like humans.

Of course, different people find different math concepts harder and easier, and AIs have trouble with yet different concepts.

So…yay humans??

Case in point: this fall Dan Garisto discusses some recent research trying to use current generations of language models to learn to do quantitative reasoning, including word problems [1].

These models are gangbusters at learning to imitate human performance based on enormous training sets.  They have achieved unbelievable results, including effective translation between human languages. (They are also the source of endless unintentional comedy.)

One glaring exception is that this technology has not worked well on quantitative reasoning.  I.e., giving a corpus of word problems, ML doesn’t learn how to solve word problems very well.  There is something about quantitative reasoning that general purpose language models don’t “get”.

So, let’s fiddle.

This summer researchers at OpenAI reported a model that was augmented with specialized training [2]. A model trained on general language was given a specialized training set of math problems.  The model also works by breaking the problem up into smaller steps.  They also provided examples where the model got the wrong answer.  The experiment also use a voting technique, solving 100 times and selecting the most common answer.  

These variations were surprisingly effective.  The AI improved from “worse than me” to “as good as CS grad students”.  (I was once a CS grad student, but my word problem capabilities have rusted a bit since.)

It is striking that, except for the explicit voting scheme, these are all techniques favored by Carbon based units.  So…yay for math teachers, even for AIs?

The accuracy is somewhat robust, but the models don’t do very well on other math problems, i.e., from different training sets.  So what is it learning?

Of course, language processing models don’t do reasoning, at least not in any way that they can explain to puny monkeys.  On the other hand, breaking up the problem into pieces is a useful form of showing your work.

But overall, it’s kind of hard to determines just what such a model knows or understands about math.   

Still, it is notable that the magic of general purpose ML seems to fail for quantitative reasoning, but improves with specialized pedagogy.  I note that that expert systems are generally done by more procedural AI techniques, not general purpose learning, and not with this kind of specialized training, either.

So, from this perspective, these math problem tasks seem to be somewhere between general language and domain expertise.  I.e., it seems the AI is learning something other than “how to do math problems”, but less than how to mimic “what people know about doing math problems”. 

This seems to confirm our intuition that there really is an abstract “quantitative reasoning” that lies between the words of the problem and the words of the solution.  But we don’t know what this quantitative reasoning is, or whether the machine is learning the same stuff as humans think they learn.


  1. Dan Garisto, AI Language Models Are Struggling to “Get” Math in IEEE Spectrum – Artificial Intelligence, October 19, 2022. https://spectrum.ieee.org/large-language-models-math
  2. Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, manol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra, Solving Quantitative Reasoning Problems with Language Models. arXiv, 2022. https://arxiv.org/abs/2206.14858

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.