With all the yick-yack about AI “exterminating” us, there has been little explanation of just how this might happen. What, exactly, are machine learning models supposed to do that will wipe out humans?**
Classically, in Clarke-Asimov days, it was imagined that computer intelligences would be turned to optimizing themselves, initiating a feedback loop that accelerated the capabilities of computer intelligences. Soon enough, Carbon-based units would be left behind, unable to even understand what their Silicon-based offspring are up to.
If not extinction exactly, humans might face irrelevance. Ouch!
While ChatGPT and friends have been putting on their comedy shows, this summer the folks at DeepMind report on actual progress on this front [2].
In earlier research, DeepMind found improved algorithms for matrix multiplication, which is huge. Among other things, more efficient matrix multiplies can potentially make deep learning models more efficient—precisely the kind of feedback we have been expecting.
The new work has found faster ways to sort numbers, which is pretty much “the other thing” computers do, besides multiply matrices. In a way, these results are even more astonishing than the matrix multiplies, because so much Carbon-based brain power has gone into thinking about sort-based algorithms.
The really super, massively, cool thing, though is that they made the system very general, so it can find better algorithms to do anything. How did they do it? They did it the way AlphaGo conquered Go—they gamified it. [1]
Basically, the task is to generate a program to solve a problem, with each assembly language statement considered a “move”. The generated program is run, and points awarded for correct answers. The machine learning system learns from a bunch of assembly language examples, which guide its search for the best move.
(As we used to quip, “Programming is easy: just generate every possible program, and throw away the ones that don’t work.” : – ) )
Wow!
Not only does the dog dance, it is a champion dancer!
As many commentators noted, these improved sort fragments have been dropped into widely used open source library (libc++), which is used by, well, everything [3]. A 1% improvement in these inner loops will speed up, well, every damn program in the world!
Put that on you resume!
Very, very cool.
And, like AlphaGo’s championship games, these results are fascinating reading for us puny Carbon-based programmers.
One interesting thing about this methodology is that it rings true to the psychology of the best human programmers. I’ve always held that messing about with software is the greatest game ever invented, and we even get paid to do it! Good programmers intuitively tackle their problems as a strategic game. We are making moves, looking for a winning combination.
In this case, the reinforcement schedule reflects “the adversary”, which captures the specifications and, implicitly, the limitations of time, space, etc. The AlphaDev system systematically searches the “game space” of possible programs, which humans navigate intuitively.
- Matthew Hutson, DeepMind AI creates algorithms that sort data faster than those built by people. Nature, 618 June 7 2023. https://www.nature.com/articles/d41586-023-01883-4
- Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, Thomas Köppe, Kevin Millikin, Stephen Gaffney, Sophie Elster, Jackson Broshear, Chris Gamble, Kieran Milan, Robert Tung, Minjae Hwang, Taylan Cemgil, Mohammadamin Barekatain, Yujia Li, Amol Mandhane, Thomas Hubert, Julian Schrittwieser, Demis Hassabis, Pushmeet Kohli, Martin Riedmiller, Oriol Vinyals, and David Silver, Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618 (7964):257-263, 2023/06/01 2023. https://doi.org/10.1038/s41586-023-06004-9
- Armando Solar-Lezama, AI learns to write sorting software on its own. Nature, 618:240-241, June 7 2023. https://www.nature.com/articles/d41586-023-01812-5