Not Another Sherlock Holmes Reference, Watson

I’ve been following the Jeopardy! IBM Challenge with interest over the past few days. Pitting man against machine in the arena of language comprehension and general knowledge is just good television. I can’t claim to be an expert on Artificial Intelligence, but I am programmer, interface designer, and trivia buff.

Let’s get this out of the way immediately: the technology that went in to Watson is phenomenal. It is definitely a step forward from anything that I at least have ever seen with regards to a computer’s comprehension of language. English in particular is a very nuanced language, rife with pun and innuendo. The imagination runs wild with the possibilities, and of course IBM is right there stoking that fire with promises of real world applications for medicine and any other discipline that tends to make us humans more apt to agree that the endeavor is for the benefit of humanity.

However, this is an incremental step, and I don’t really view it as anything revolutionary in its own right. It should also be obvious to anyone that watched that we still have quite a ways to go before we put matters of life and death into the hands of a decision making super computer. Computers are good at doing things we tell them to do. I view Artificial Intelligence as a difference in degree, but not in kind, of any run of the mill set of software algorithms. In the end, even a “learning” robot or computer can never do anything that it was not programmed to do, even if in an abstract sense. More on that later.

It all seems so magical until you break down the problem into parts. The machine’s understanding of English grammar is by far the most complex issue that must be overcome. Yet it too follows strict rules, and if the algorithm can parse the clauses correctly, the actual fact finding process should be relatively trivial.

For any given question or clue, the machine must deduce what kind of answer is expected. Is it a place? A person? An abstract word or concept? There are obvious word cues in many cases, but other times it is less clear. More fundamentally, the machine is being asked what is expected of it. In the context of Jeopardy!, the answer is almost always a noun or the infinitive form of a verb. Watson usually got this part right, but at times was way off. We take it for granted that our brains do this part with such ease.

For any given clue, there are numerous keywords that restrict the possible answer set. Not to trivialize the process, but this part seems much more simple programmatically to me than the parsing of grammar and figuring out what kind of answer is being asked for. Every bit of knowledge has related concepts, people, and places. If we know we’re talking about a given place, and we know the answer is a person, in most cases we’ve already narrowed our possible answers to the ten’s or hundreds. It’s really just a matter of figuring out how strong the correlation is between the keywords in the clue are to one another. Again, when I make light of this process, I am not saying that this is some weekend project a kid could do; I’m merely saying that the means by which it is done is not hard to comprehend or explain. It’s all a game of probability, and Watson is the most advanced probability engine with regards to parsing the English language that anyone (I think) has ever seen.

Back to the concept of AI. It took IBM millions of dollars, 4 years, and some very smart people to make Watson. In his current form, Watson was programmed to do one basic function, and he does it quite well. He is not some walking android; he is a network of a ALOT of computers in a large facility. If you asked Watson, “What does a circle look like?” he would not show you a circle. He would calculate the highest probability answer of related keywords and spit it back at you. This of course seems silly, but this is what I mean when I say that a machine cannot do what it has not been programmed to do. Even if it is capable of “learning” new things, those things are still within its programmed parameters. It’s pretty darn cool, but let’s not get carried away. We are not on a verge of self-aware automatons that adapt to their environment and make split second intuitive decisions. But the more varied those parameters are, the closer we come. Whether we ever get to the point of a small, mobile robot capable of having a general enough set of parameters to function on a quasi-human level, I’ll leave to the experts. I just know that we’re nowhere near that right now.

The interesting thing about Jeopardy! is that it’s not really all about facts per se; it’s a combination of comprehension, thumb reflexes (for the buzzer), and knowledge. Most people who are on Jeopardy! are pretty smart from a sheer knowledge perspective, otherwise they wouldn’t be allowed to be on the show. So the lesson isn’t really that Watson is “smarter” or knows more facts than the human contestants. As hugely successful past Jeopardy! contestants, both Ken and Brad probably knew most of the answers. Therefore, either Watson was able to comprehend the question and compute the answer more quickly, or the sheer speed with which he was able activate his buzzer (or a combination of both) was what made him succeed over his human rivals. I’d hate for it to have been primarily the latter, because all we’d be saying then is that IBM built a machine that had better reflexes than a human.

Let’s break down the question and answer process further. Once a clue is revealed on the screen, Trebek reads it aloud as it is simultaneously sent to Watson in plain text. Contestants are not allowed to buzz in until Trebek finishes reading the clue, which can take a few seconds depending on how long it is and how much he embellishes its reading. If the contestant comprehends the question and comes up with the answer in those few seconds, it comes down to reflexes. That is, who can push the buzzer faster after the signal is sent that contestants can now buzz in? I’m pretty sure that Watson would have the humans beat every time if he had comprehended and come up with an answer to the clue prior to the time when he was able to buzz in.

So I’m guessing the humans either had some very good (read: lucky) timing with the buzzer or Watson took longer to comprehend the clues for which a human buzzed in first. I noticed that in several cases, Ken Jennings concentrated on buzzing in even before he knew the answer — probably because he had reasonable confidence that he knew the answer and also that he’d have a couple seconds after he buzzed in to actually figure it out. Specifically, it was in the “decades” category, for which the answer space is tiny (no more than 20 for sure). The probability therefore of someone with some decent general knowledge of history being able to figure out the answer in a short span of time is very great. That was a hugely successful adaptive strategy while it lasted, and it is an example of something that Watson would not have done because he was not pre-programmed to do so. Watson had some general strategic principles built in, like not wagering a lot of money when you’re really far in the lead, and not buzzing in unless there was a pretty good chance of success. But beyond that, there was no “outside the box” (literally) strategizing going on.

In the end of course, Watson crushed the humans. There are a few things I can think of that would have made Watson a better Jeopardy! contestant, but few of them would have improved the PR stunt. After all, it’s in nobody’s interest for Watson to absolutely obliterate the human competition. That wouldn’t be interesting to watch at all. In the end, the humans put up a decent fight, all things considered. So here’s my list of improvements. They have to do more with an understanding of how the game works rather than technical improvements to the actual comprehensions system.

1) Watson should have buzzed in all the time, right away, no matter what. Period. He would have missed 3 or 4 questions the entire match, but would have gotten the lion’s share of questions correct. He also would have had a couple precious seconds to complete his computations if he still needed the time. This of course would have made the whole event boring, but nonetheless, there it is.

2) Watson had no peer relating input mechanism. When Ken answered incorrectly, Watson buzzed in and answered the same thing. Epic fail, IBM. This would have been so simple to build in compared to the rest of the system. All you need is an audio input and some generic voice recognition capability. There is no ambient noise to speak of, and anyway everyone is on a mic. Most operating systems nowadays have this built in and can do a half decent job of understanding your voice even without training. If your top answer has a zero probability of being correct, eliminate it. IBM got so caught up in the technology of producing good answers to questions that they forgot to build in simple inputs based on the way the game is actually played. They couldn’t see the forest for the trees on that one. As an interface designer, I see this all the time. A programmer (myself included) gets so wrapped up in solving a given problem that he/she ignores the more obvious problems that need to be solved that will have a much greater positive impact.

3) The “Toronto” debacle was also pretty funny, and one that IBM felt it needed to explicitly address. For Final Jeopardy!, The category was “U.S. Cities,” and the question was about a city with two airports, one named after a Word War II hero, and the other after a World War II battle. Watson guessed “Toronto.” The humans both got it right (it was “Chicago”). To IBM’s credit, Watson was very unsure of his answer. Nonetheless, his answer was so ridiculous that I think IBM was just a little embarrassed. Their fumbling explanation is even funnier than the initial error. Essentially, Watson was programmed to put less emphasis on the category name because the category names in Jeopardy! are particularly prone to pun and other language constructs not easily parsed by a machine. So my 20/20 hindsight recommendation: treat the category as pretty important and just program Watson to better understand the complexities of the Jeopardy! category — or at least to understand when a category is likely to be straightforward and when it is likely to be tricky, and weight it accordingly. There only a couple different ways that the category “U.S. Cities” could be misleading, and even then “US” probably would have been in quotes or something. So really Watson should at least be able to tell to what degree the category name is ambiguous instead of writing it off almost entirely as a rule.

All in all, this has been very fun to watch and talk about. So kudos to Jeopardy! and IBM for taking it on. I don’t really think I’ll be letting a machine make any substantial decisions for me any time soon, though.

Leave a Reply

Your email address will not be published. Required fields are marked *