Gregory Mone wrote a short piece on “Sensing Emotions”  in the September 2015 Communications of the ACM**, sketching the main technologies in “affective computing”—sensing human emotions from sensors, mainly visual imagery. As he reports, this is being used to develop advertising, but has potential uses in education, home care, and any situation where software needs to know what humans are thinking.
The basic techniques are essentially machine classification, learning from (manually) labeled images (e.g., 100,000 images of people “feeling joy”). Using techniques no doubt pioneered for intelligence systems, this “deep learning” does pattern recognition at multiple levels of grouping (pixels, areas, features). Lavishing vast, but mindless, computing on these images, it is possible to discover highly robust classification patterns. The “deep” in “deep learning” means that it does a tone of shallow computation on a pretty deep pile of data.
In addition to advertising, this is potentially of interest for, say, carebots and medical systems, where we’d like the system to notice if someone is in pain, or unresponsive, or otherwise needs attention but hasn’t asked for it. Of course, it will also be used in “lie detection”, and, I’ll bet, in digital dating. Sigh. Digital voodoo.
I’m sure that this machine learning “works” pretty well. But what is it really doing?
As far as I can tell, success depends on the training set. If you have a large enough set of images, that are “correctly” classified, then the computer can learn to reproduce these human classifications. (The humans and the computers are almost certainly using different algorithms to classify the imagery,, so this agreement is kind of interesting philosophically.)
So, the deep learning is absolutely not learning to recognize human emotions, it is learning to match a particular human labeling system that is meant to describe the emotions of the people in the images. This is useful only to the extent that the labeling is accurate. Caution is certainly in order, because, without context, humans aren’t especially great at labeling the emotion of another person from an image. Such a classification needs to be s cross validated with other measures.
One point skated over in Mone’s article is the question of cultural and other variation. It’s one thing to develop a classification system for a single culture (perhaps united by language and mass media), but even if the “inner emotions” are universal, that doesn’t mean that the labels or the visible traces are the same for all cultures, for everyone in a cultural group. For that matter, how fast do they evolve over time? I’ll bet that individuals adapt to their context, picking up a “local accent” from their current situation. Do these effects show up in the classifiers?
Basically, we need to ask just what these “deep learning” systems are really learning, who are they learning from, and who defines the normative group? In the case of advertising, odds are that it is tuned to their preferred audience—young, affluent, white. Other applications use other appropriate datasets, e.g., populations of medical patients for a clinical system.
This makes me wonder just how universal these recognizers turn out to be, and how we might combine them. Also, if computers are using these recognizers to make unsupervised decisions, e.g., about someone looking “dishonest” or “not paying attention”, there are going to be big problems, especially if the mistakes are systematically biased against certain demographic groups.
Is this going to be a new form of cultural imperialism, like marketing and movies, imposing the emotional expressions of the dominant group upon all users of the computer system?
By the way, the same issue of CACM, there is a nice piece by Phllip Maddox, “Testing a Distributed System” , In case you are in any doubt why this is important: pretty much everything is tied to a distributed system these days.
This is familiar territory to me, I pioneered these systems for more than two decades, and I assure you that this is truly hard.
As Maddox says, software and system testing is hard in any instance, but in a distributed system (think: “the cloud”), “multiple processes, written in multiple languages, running on multiple boxes..on different operating systems” (p. 54). And I would add “across multiple networks operated by independent organizations sometimes with a massive number of users doing different things”.
In fact, as he notes, you may not have access to (or even be able to delineate) the whole system. But we must do the best we can!
Maddox mentions some especially difficult points.
One nasty problem is asynchronous communication and timing in general. Distributed systems tend to be loosely coupled, and asynchronous communication is common. E.g., event notifications are multicast, and arrive “eventually” to an undetermined set of recipients. It isn’t possible to test “everything”, but testing needs to try to cover various delays and other variations, to try to uncover potential problems.
A second challenge is “node failure”: a complex distribute system has many components, some of which may fail temporarily or permanently. (One of “Bob’s Laws” of computing is, “If you have enough CPUs, some of them are broken.”) The system must continue despite some failures. And it must continue if a node drops out and then comes back on. Testing this requires ways to simulate failures and check that the remaining system works correctly.
(Note that the difference between a node that is really slow or out of timing, and a node that is “broken” may very well be moot!)
In the end, he notes that no two distributed systems are the same, and testing must be adapted to the specific system and its requirements.
Good article. Everyone should read it.
- Maddox, Philip, Testing a distributed system. Commun. ACM, 58 (9):54-58, 2015. http://dx.doi.org.proxy2.library.illinois.edu/10.1145/2776756
- Mone, Gregory, Sensing emotions. Commun. ACM, 58 (9):15-16, 2015. http://dx.doi.org.proxy2.library.illinois.edu/10.1145/2800498