First, I would like to apologize for posting so infrequently these past few months. I have been working hard to flesh out a book proposal closely related to the perspective of this blog, and I will be focused on this project for a bit longer.
However, a TED talk filmed in Paris in May came to my attention today. The talk was given by Blaise Agüera y Arcas who works on machine learning at Google. It was centered on illustrating the intimate connection between perception and creativity. Agüera y Arcas surveyed the history of neuroscience a bit, as well as the birth of machine learning, and the significance of neural networks. This was the message that caught my attention.
In this captivating demo, he shows how neural nets trained to recognize images can be run in reverse, to generate them.
There must be a significant insight here. The images produced when neural nets are run in reverse are very interesting. They are full of unexpected yet familiar abstractions. One of the things I found particularly interesting, however, was how Agüera y Arcas described the reversal of the recognition process. He first drew us a picture of a neural network involved in recognizing or naming an image, specifically, a first layer of neurons (pixels in an image or neurons in the retina) that feed forward to subsequent layers, connected by synapses with varying strengths, that govern the computations that end in the identification or the word for the image. He then suggested representing those things – the input pixels, the synapses, and the final identification – with three variables x, w, and y respectively. He reminded us, there could be a million x values, billions or trillions of w values and a small number of y values. But put in relationship, they resemble an equation with one unknown (namely the y) – the name of the object to be found. If x and y are known, finding w is a learning process:
So this process of learning, of solving for w, if we were doing this with the simple equation in which we think about these as numbers, we know exactly how to do that: 6 = 2 x w, well, we divide by two and we’re done. The problem is with this operator. So, division — we’ve used division because it’s the inverse to multiplication, but as I’ve just said, the multiplication is a bit of a lie here. This is a very, very complicated, very non-linear operation; it has no inverse. So we have to figure out a way to solve the equation without a division operator. And the way to do that is fairly straightforward. You just say, let’s play a little algebra trick, and move the six over to the right-hand side of the equation. Now, we’re still using multiplication. And that zero — let’s think about it as an error. In other words, if we’ve solved for w the right way, then the error will be zero. And if we haven’t gotten it quite right, the error will be greater than zero.
So now we can just take guesses to minimize the error, and that’s the sort of thing computers are very good at. So you’ve taken an initial guess: what if w = 0? Well, then the error is 6. What if w = 1? The error is 4. And then the computer can sort of play Marco Polo, and drive down the error close to zero. As it does that, it’s getting successive approximations to w. Typically, it never quite gets there, but after about a dozen steps, we’re up to w = 2.999, which is close enough. And this is the learning process.
…It’s exactly the same way that we do our own learning. We have many, many images as babies and we get told, “This is a bird; this is not a bird.” And over time, through iteration, we solve for w, we solve for those neural connections.
The interesting thing happens when you solve for x.
And about a year ago, Alex Mordvintsev, on our team, decided to experiment with what happens if we try solving for x, given a known w and a known y. In other words, you know that it’s a bird, and you already have your neural network that you’ve trained on birds, but what is the picture of a bird? It turns out that by using exactly the same error-minimization procedure, one can do that with the network trained to recognize birds, and the result turns out to be … a picture of birds. So this is a picture of birds generated entirely by a neural network that was trained to recognize birds, just by solving for x rather than solving for y, and doing that iteratively.
All of the images displayed are really interesting. And there are multilayered references to mathematics here: the design of neural networks, conceptualizing and illustrating what it means to ‘run in reverse,’ and even the form the abstractions take in the images produced (which are shown in the talk). Many of the images are Escher-like. It’s definitely worth a look.
In the end, Agüera y Arcas makes the point that computing, fundamentally, has always involved modeling our minds in some way. And the extraordinary progress that has been made in computing power and machine intelligence “gives us both the ability to understand our own minds better and to extend them.” In this effort we get a fairly specific view of what seems to be one of the elements of creativity. This will continue to highlight the significance of mathematics in our ongoing quest to understand ourselves.