The computer systems that increasingly mediate our communication can either facilitate or hinder the flow of information. As professional communicators, we need to understand how well the content in the system keeps the conversation flowing, and how these communications are taking place.
First, a bit of linguistics
Written words are made up of symbols which—in many writing systems—represent sounds. The symbols combine to form words that represent concepts. The words combine to form sentences and paragraphs that describe more complex concepts, including some that are kind of meta, like this one.
As babies we experience this in reverse order: we form basic concepts (hunger, discomfort, desire), then learn that certain sounds correspond to those concepts. We learn to mimic the sounds so we can communicate to others, and then we learn the symbols. Meanwhile, we’re developing more complex concepts, and we study spelling and grammar to help us express those ideas clearly. In a few short years, we develop from burbling infants to speaking toddlers, and from there to learning a complex, nuanced communication system.
Contrast this learning process with the way we program computers: we use a limited language with rigid syntax to designate a set of commands and responses. The magic of human communication is that it isn’t merely a conduit for information. We often want to amuse, inspire, surprise, or persuade. Abstraction and ambiguity—we have many ways of describing a single concept—give us the flexibility to communicate artfully. This enables a rich culture of literature, art, philosophy, and politics. Ambiguity is a critical aspect of that which makes us human.
Interacting with machines
And we also make tools. As our tools have become more complex, we’ve made them mechanical, then automated them, then computerized them. Along the way, they’ve become increasingly “intelligent,” by which we mean they can take in anticipated bits of information and make preprogrammed decisions. A quick tour of science fiction makes clear our mixed longing for and fear of the day when machines are intelligent enough to learn new information and take un-anticipated actions.
Speculative fiction aside, no machines yet parse ambiguity as well as we do. But we’re no longer just communicating through machines, we’re increasingly communicating to machines. Often that means adopting new modes of behavior.
For example, most of us have gotten used to finding everything we can imagine through a simple web search. But let’s say you love text adventure games and you want to find out if the very first one, Adventure, is available for iPhone. You search Google for “original adventure game for iphone” and it returns a whole lot of links that touch on one or more of those ideas, but not what you were actually seeking. So you reword your request in search engine terms, adding more specific words in the hopes that it will make the results more relevant. In our dealings with machines, we make numerous minute adjustments in communication so that they’ll give us the response we need. But we’re always looking for ways to design computerized systems so that they understand us better.
Miscommunication is part of communication
Communication between people is often more complex than “message sent” and “message received.” Take this IM my favorite nerd sent me about an online video:
“Kal Penn is in a webvertisment for Rayman Origins where he is playing a kid in Rayman Origins.”
Our brains fill in missing bits of information and decipher made-up words like “webvertisement,” even if we’ve never seen them before. Here’s how I interpreted that statement:
[The actor] Kal Penn is … playing [the role of] a kid in [some entertainment product called] Rayman Origins.
A reasonable interpretation, though a little strange since Kal Penn is a grown man. But the next message was completely unintelligible in that context:
“The stakes? The kid’s mom. Kal’s going on a date with the kid’s mom.”
If I had realized that Rayman Origins was a video game, I might have understood that this ambiguous description actually meant:
Kal Penn is … playing [against] a kid in [a video game called] Rayman Origins.
As the conversation developed, I was able to get more information, ask questions, confirm the context, and revise my interpretation of the original statement until I was confident that I accurately understood it. This process of disambiguation—so natural in human communication—is just as important in our interactions with computers, but they have to be designed to communicate that way.
Computers haven’t traditionally operated well in the realm of ambiguity and miscommunication. But now we have machines like IBM’s Watson, the computer that competed in Jeopardy! against two of the show’s most successful contestants, and beat them. It was able to answer some pretty complex—and yes, ambiguous—questions in order to win the game. So does that mean the age of intelligent machines is here? Not quite yet.
Ambiguity is the major obstacle to communicating with machines as easily as we communicate with other people. Let’s consider a few examples to understand how and why ambiguity is so challenging for machines—and how we’re working to overcome these individual challenges. After that, we’ll take a closer look at major developments in the attempt to disambiguate our communication so that machines can understand us (and we them) in elegant, gratifying ways.
Lack of specificity
As with the example of the IM conversation, sometimes someone just doesn’t give us enough information to understand what they’re saying. In a conversation, we can ask for more details or for clarification for as long as we need to, limited only by our patience.
In the same situation, a computer system generally does one of two things. It makes a guess with the information it has, as in the Google example; or it may just break down, as when the extension is missing from a file name and your PC can’t open it.
Good usability design will avoid sending people down a dead end, so these days, your computer is more likely to open a dialogue box asking you what program you’d like to use to open an unidentifiable file type—essentially, it is asking for clarification. If a system fails to process ambiguous information, it should fail elegantly and give the user a way to continue forward.
This is something we need to keep in mind as we create interactive sites and products. A witty 404 page, for example, may keep people from feeling frustrated with your brand, but it would be more satisfying if you offered them some possible reasons for the misstep and some corrective suggestions to set them back on the path towards their goal.
Homonyms are sets of words that are spelled the same, but have different meanings, like blackberry, the fruit and Blackberry, the mobile device. If you start a sentence with “She shot him…” it’s going to mean something different if you finish it with “a glance” or with “in the knee.”
Wikipedia has developed a very manual solution to this type of ambiguity: disambiguation pages. On these pages, people can list every possible usage of a word or name, with some context that helps people distinguish between them. There are currently over 200,000 disambiguation pages on Wikipedia. I admit, I’ve created a few myself. The intelligence in these pages is only limited by the contributions of the Wikipedia community, since anyone can add a new page, a new word usage, or clarifying information at any time.
If homonyms are a serious concern in the body of content you’re working with, it’s best to work with a content tagging system that assigns each tag a unique ID. That way, behind the scenes, blackberry-the-fruit is identified to your system as “4865,” Blackberry-the-device is identified as “8943,” and the content they’re each associated with will remain distinct. When a user does a search for “blackberry” on your site, you can give them a raw set of results that match the string of text, but you can also give them the option to filter their results by the particular blackberry they’re looking for.
Just as some words represent many meanings, we also have many ways of saying the same thing. This is fine for people, but machines have a harder time recognizing when two non-identical sets of symbols represent the same entity.
At one point in my career, I worked on an entertainment website. We set out to normalize the free-form keywords in several years’ worth of articles and replace them with a controlled vocabulary of people, movies, albums, etc. In one extreme example, we discovered that the name of the movie “Star Wars: Episode I—The Phantom Menace” had been expressed in the following ways in the keyword field:
- Episode 1
- Episode I
- Phantom Menace
- Star Wars Episode I The Phantom Menace
- Star Wars Episode I: The Phantom Menace
- Star Wars prequel
- Star Wars: Episode 1 — The Phantom Menace
- Star Wars: Episode i — the Phantom Menace
- Star Wars: Episode I: The Phantom Menace
- Star Wars: Episode I — The Phantom Menace
- Star Wars: Episode I–The Phantom Menance
- Star Wars: Episode One — The Phantom Menace
- Star Wars: The Phantom Menace
- Star Wars: The Phantom Menace — Episode I
- The Phantom Menace
- The Phanton Menace
Some of those are misspellings, but many are perfectly legitimate. And while it might be annoying that the editors used so many variations of the name, a reader could also have used any one of these variations to search for content. And if they had, they would have gotten different results each time. In this case, disambiguation is needed to accurately search across variations in terms and provide a consistent experience.
Many controlled vocabulary systems and search engines allow for the inclusion of synonyms, alternate spellings, and misspellings. Take a look at your search logs to see what terms people are searching for and not finding—and if they match something you do cover on your site, add them into the index as alternate versions of the appropriate term.
When you give your friend some news and she responds, “You’ve got to be kidding me!” you have a pretty good idea if she’s pleasantly surprised or unpleasantly shocked, based on the tone of her voice. We demand a lot of our words and often use the same phrases to express a whole range of circumstances. In person, we have tone of voice, body language, and context to help us interpret the meaning of the words.
We don’t have all of those cues on the internet. So, when sending an email or instant message, or posting something to a social media site, we sometimes attempt to clarify by using emoticons, extra punctuation (!!!), ALL CAPS, or (in extreme cases) <sarcasm></sarcasm> tags around a potentially ambiguous phrase.
Automated sentiment analysis is a growing industry in our communication landscape, where everyone who has an opinion also has a way to express it online. We have access to a huge amount of data, but we need accurate tools that can interpret the intention of what people are saying. Keep an eye on the convergence of semantic technologies and analytics for more developments in this area.
If you’re of a certain age, you may remember the scene in My So-Called Life when teen protagonist Angela Chase told us in voice-over how her dreamy crush, Jordan Catalano, was “always closing his eyes, like it hurts to look at things.” A quick shot of Jordan leaning on his locker, whipping out the eye drops, gave us a different perspective on the source of his eye discomfort. Angela’s failure to accurately portray the situation tells us more about her than it does about Jordan.
Machines have the potential to be more accurate narrators than humans. After all, they have access to (and reliable recollection of) a lot more data than we do, and with the right framework they can identify connections that we would never have thought of. It’s hard to make the case that you’re sticking to your workout regimen when you check in to your gym and Foursquare says “Welcome back! You haven’t been here in 3 months!”
So why aren’t sites more effectively personalized? We have the ability to know what the users of our site like—sometimes better than they do themselves. The question we should always be asking ourselves is: what information have we gathered that we can use to create a better experience and make our content more useful?
Now let’s take a look at some of the approaches we’ve developing to help disambiguate communication between people and machines.
Where we’re headed
If we want to communicate better with our machines, we have two choices. We can learn to think and express ourselves more like machines do. Or we can design technology systems that think more like us. One way of doing the latter is to make the content smarter—to pack it with rich metadata so that it contains information that will help automated systems interpret it better. The other way is to make the systems themselves smarter—program them to look for increasingly subtle cues about the meaning of any content it receives. Some interesting development is already taking place in both of these approaches.
Making the content smarter
In order to make the content itself more easily interpretable, there’s a growing need for content creators to add metadata to their content. This is information that isn’t necessarily visible to the public, but adds structure and meaning to the content so that it can be delivered in a variety of flexible, dynamic, and contextually meaningful ways. Metadata is very useful for text-based content, but indispensible for images, video, and other rich media content, much of which is not directly searchable. When we search flickr.com for interesting photos, we rely on the titles, descriptions, and tags that people have provided. Sometimes we may also use system-provided metadata like the photo date or camera type.
As content strategists (or co-conspirators from other disciplines), we need to help our content creators by identifying and implementing the most effective content structures for their needs. There are many useful standards to choose from (last year I gave a talk called “Make Your Content Nimble” which referenced many of them). And new ones are emerging all the time. Then we need to point our content creators towards viable sources for adding metadata to their content—whether that’s in-house resources, hired guns, automation, or crowdsourcing. In-house and hired guns are manual, time consuming, and expensive approaches, and just may not be viable for large volumes of content. So let’s take a look at the other two alternatives.
Some semantic technology tools take in unstructured content and identify key names and concepts in the text. This process is called “entity extraction,” and it uses some pretty sophisticated natural language processing to determine the meaning and classification of the words it identifies. As a result, it doesn’t just tell you “related terms,” it tells you whether the terms are people, places, businesses, products, dates, concepts, etc.
Some services take it a step further and automatically find other useful resources that can be incorporated into the content. Zemanta, for example. I put a draft of this article into the live demo on their site. It analyzed the text and suggested a bunch of tags (Rayman Origins, Kal Penn, Video game, iPhone, Ubisoft, Xbox 360, Wii, Games). Some of those terms weren’t mentioned in the draft, and while they may not be appropriate in this article, they would probably be appropriate if I were actually writing an article about Rayman Origins. In addition to tags, Zemanta’s demo suggested some inline links to a variety of sites (for named entities like Kal Penn, but also for concepts like “ambiguity” and “miscommunication”), a bunch of usable images that were either public domain or creative commons licensed, and a handful of related articles, with the option to identify my preferred sources.
Zemanta is a tool that’s mainly aimed at bloggers, but there are others designed to be incorporated into major publishing platforms, and even to perform entity extraction on huge archives of existing content. A service called Calais, owned by Reuters, also has a web demo available (viewer.opencalais.com). The metadata this demo generates isn’t as publish-ready, but it provides more insight into how it’s structuring the extracted entities.
If you’re working with an organization with massive amounts of existing content, automation may be the best option to get a baseline of metadata pretty quickly. It may not be cheap, and it may not be as accurate or complete as if it were manually tagged by experts, but it will add a level of contextual information that makes the content much more valuable.
Individual content creators are getting used to tagging their own content, and some larger organizations have employed crowdsourcing to enlist their audience to help out with the work. On flickr.com, for example, depending on how permissions are set, you can tag other people’s photos. The Commons is a huge public archive of images, submitted to flickr.com by libraries and other institutions from all over the world. The public is invited to browse the photos and add tags to them, making them more findable and useful to future visitors to the collection.
Crowdsourcing approaches can also be used in targeted ways to augment automated processes. A company called Metaweb has built Freebase, a database of knowledge similar to Wikipedia but much more structured. It takes sources of information, including Wikipedia, and uses a mix of automated and manual means to transform it into consistently structured data. They developed some “data games” to allow the public to help them fill in the gaps when the automated process isn’t able to extract all of the data it needs. One such game, the “Genderizer,” asks readers to look at short passages about real people, fictional people, or biological organisms and indicate whether the entity mentioned is male, female, or other. For example:
Soanya Ahmad (born October 5, 1983) is a photographer and sailor, holding the current women’s world record for the longest time spent non-stop at sea.
While an automated system may not necessarily be able to make the connection between “holding [a] women’s world record” and being female, a person can read that sentence and quickly come to the correct conclusion. This kind of game, though not quite as fun as Angry Birds, can be strangely addictive—one user has racked up nearly 40,000 responses. And it spreads out the work of generating a lot of useful disambiguation data.
Crowdsourcing, while being an inexpensive source of human mind power, has a number of drawbacks that may make it impractical for many projects. Take these into account when considering this approach:
- Quality control. The data provided by users will not be as accurate, complete, or consistent as it would be if created by trained experts. Even if your domain doesn’t require specific expertise, people think differently and will apply different lenses to the content. Alternatively, you can limit the exercise, as in the Freebase example, and you’ll get less variance by offering pre-defined choices than by having freefrom data entry. You can also have multiple people evaluate the same set of data and take only the best or most common responses.
- Speed. Since you won’t have a dedicated set of contributors, it could take a long time for the data you need to trickle in. Most people who actively contribute at first will eventually drop off. This means that if you have a very large body of content, you’ll have to keep getting the word out and recruiting new contributors to the project.
- Incentive. Just because people can contribute doesn’t mean that they will. I’ve been in brainstorming meetings where someone suggested adding user tagging capabilities and I was the only one to ask “Why would people want to do that?” Sometimes there’s an inherent incentive. For example, the writer of a popular webcomic called Diesel Sweeties asked readers to help transcribe each comic so they could be text searchable. His dedicated fans made quick work of the 1500+ comics that were already in the archive. But would casual readers of a major publication have the passion to do the same?
If these factors don’t pose a problem for your project, then crowdsourcing might be an excellent way to get some badly needed metadata for your content.
Making our tools smarter
In a utopian world of human-machine communications, we would continue making content as we always have, and make requests as we naturally would, and computerized systems would just understand us. We may never get to that ideal state, but we’re working to make significant improvements. There have been notable developments in the areas of recommendations, searching, voice recognition, and machine intelligence.
In person-to-person interactions, our friends and acquaintances sometimes suggest things they think we would like. They base these recommendations on things they’ve enjoyed and filter them through their accumulated knowledge of what we like. It’s a guess, and the recommendations are not always correct, but it’s a pretty sophisticated activity that takes into account a lot of nuanced knowledge about our personalities and preferences.
As we conduct more of our purchasing and content consumption activities online, sites and systems have collected a lot of information about our tastes and habits, and many of them attempt to make us into repeat customers by suggesting other things we might want to purchase or consume. For many people, the first time they encountered a recommendation engine was probably on Amazon.com. When you’re looking at a page about a product and it tells you “People who bought this also bought…” that’s a pretty blunt instrument, but sometimes it leads you to discover things that are also in your area of interest.
Netflix famously offered a $1 Million prize to the person or group that could help improve their recommendation algorithm by 10%. The algorithm not only suggests movies that you might like based on previous choices and ratings, but indicates how likely you are to enjoy any given movie. After nearly three years, two teams combined and won the prize in a photo finish against another combined team. A second round of the contest was cancelled in March 2010 due to legal concerns over privacy. (Some people just don’t want the world to know how much they’re going to like “My Little Pony: Friendship Is Magic.”)
Other sites that aren’t directly selling something, but want to keep you engaged and build page views—many online newspapers, for example—also try to capture your attention by offering other content you might like. But they often simply offer more content of the same type, more content from the same section, or the most popular items on the site. If that’s not good enough for Amazon or Netflix, why should it be good enough for sites like the New York Times or the Huffington Post?
It may be beneficial to incorporate more targeted recommendations into publishing sites, but I haven’t seen anyone do it yet. Traditionally, one obstacle has been that sites like the New York Times didn’t have access to as much data about a person’s consumption habits as sites like Amazon or Netflix. But that’s changing with deeper social integration. Now, if you sign in with Facebook, the Times will have a better idea what articles you like, share, and comment on. So why aren’t they making use of that data to provide their readers a more engaging experience?
Most people use Google to search for things online. I do too, and have done for a long time. But a few years ago I started noticing that the results had become, well, kind of crappy. A lot of the results I was getting for any given query would be link farms, sketchy rip-offs of original content, or otherwise not-very-useful pages. I thought, “Well, they’ve done it. The rampant proliferation of junk content has overwhelmed Google. And now it’s broken.” But I adjusted my search behavior and kept using it anyway, and then… it got better again. I don’t know exactly when. I don’t even remember noticing that it was better until I was writing this piece and I harkened back to those dark times.
According to a Wired article called “How Google’s Algorithm Rules the Web,” Google’s primary competitive advantage is that their search algorithm is engineered in such a way that they can constantly tweak it. They collect massive amounts of data about what people search for and what they do when they get the results: not only what users click on, but whether they change a word in the query and try again, or scrap it and start over. Collecting data on the results that didn’t work for people—and what they did to remedy their query—points Google towards areas where they could potentially improve the algorithm.
In the two years since that article was written, one can assume that Google has made thousands of minor adjustments to the algorithm, in addition to the major initiatives we’ve heard about. While Google tends to keep their detailed plans pretty close to the vest, here are a few examples of recent activities and the possible impact they could have:
The “content farm killer”
- A new version of the algorithm, dubbed Google Panda, dealt head-on with the crappy content problem I mentioned earlier. This upset the apple cart for a lot of content farms (and perhaps a few innocent bystanders caught in the crossfire), but there was a significant improvement in the quality of search results.
- Google bought Freebase, the database of knowledge mentioned earlier. This massive collection of relationship-rich data about people, places and things will most likely be used to improve the accuracy of search results as well as display useful information directly in the results.
- A year later Google, Bing, and Yahoo! announced that they would be supporting a structured data standard called schema.org. This move could inspire many content publishers to add semantic markup to their web pages, making all of that content that much easier for machines to find and interpret.
- Just last fall they purchased Zagat, which will likely be used to feed local review content directly into search results. Ratings, reviews, and other Zagat data could show up instantly when you search for the name of a restaurant.
- At the same time, they launched Google Flight Search, and soon included that data in search results. If you search for “NYC to London” you’ll see actual flight information along with the usual list of links. [At the time of this writing, they’ve gotten some pushback on this feature, so they’re currently only displaying flight schedules, not prices or links to purchase tickets].
Human validation of relevance
- When a 125-page “content quality” rating guide—detailed guidelines Google employs to test and evaluate the search algorithm—was leaked to the Web, SEO specialists immediately tried to mine the document for exploitable tips. But there were no secrets there. In fact, it’s probably more useful to writers, editors, and content strategists who just want to create quality content. Also, it’s interesting to note that, with all the data available to them, Google also manually evaluates the effectiveness of their algorithm, because people bring a nuance to judging relevance that machines just can’t match.
- As I write this, Google has just announced “Search plus Your World,” their latest foray into Social Search. It allows users to toggle between global results and results from within your social network (your own content as well as that shared by your contacts). This could be useful for certain kinds of queries, since it incorporates the intelligence of people who, ostensibly, share your interests.
Even though Google is making constant improvements in an effort to become all things to all people, for some, it never will be. Around 2008, interest in “semantic search engines” intensified—perhaps in reaction to the Crappy Age of mainstream search engines, when Google and its direct competitors just weren’t performing the way we needed them to.
Semantic search engines vary, but the general premise is that they use semantic technology to provide more accurate results through a better understanding of context and meaning. Some use natural language processing to better understand the intention of the queries. Others unlock relationships in the data to provide more nuanced results. Many have a lot of promise, but also major shortcomings: Often they’re designed to work with a certain subset of data, not the entire web. Some are designed to be used for a particular vertical—for example, legal, medical, or financial research. On top of that, there’s almost no way for them to wrest market share from the major players, though they’re sometimes acquired, as when semantic search engine Powerset was acquired by Microsoft before they relaunched their search as Bing and branded it a “decision engine.”
As content creators, or creators of digital experiences that serve up content, it’s important for us to keep track of these developments in search. SEO is one area of immediate impact, and the community of SEO professionals tends to be the quickest to analyze the implications of any and every movement in the search world. But the rest of us (content strategists, copywriters, developers, and designers) are probably lagging behind. Google has shown that they will continue to lead the way when it comes to connecting people with content, and we need to make sure that our content is formed, formatted, and delivered in a way that allows Google and others to tap into what it has to offer.
Devices you can talk to
The iPhone 4S, released late last year, included a highly anticipated new feature called Siri that allows users to speak directly to their smartphones. Give Siri a command, or ask her a question, and “she” will respond. This functionality incorporates two major accomplishments in the realm of computerized disambiguation. First, it uses voice recognition to discern the words you’re saying to it. Then it interprets the meaning of those words so that it can take action or provide you information. I don’t have an iPhone, so I set out to talk to some colleagues that do and find out what they think of the experience. Most of the people I spoke to that have the feature don’t use it because, after trying it, they don’t really like it—not a good sign. Eventually I found a colleague, Razorfish’s Jakes Keyes, who does use Siri and I sat down with him to test it out and hear about his impressions.
First he told me that he had to learn to speak very clearly to it, and he quickly stopped asking it the kinds of things it’s not good at handling. It can give you directions to Rhode Island, but isn’t so good at giving directions to a downtown restaurant called “Corton” (because it didn’t recognize the name). It’s good at conversions and measurements—like how many pints are in a gallon. It’s not as good at answering cultural questions, especially those involving names.
As a test, we asked it “What’s Tom Cruise’s middle name?” It repeatedly interpreted this question as “What time cruises middle name?” After half a dozen tries I finally over-enunciated the word “TOM” to a comical degree and, to our surprise, it displayed the text “Thomas Cruise Mapother IV.” Finally an answer! But we weren’t even sure if Siri had interpreted our question accurately, so we had no way to evaluate the credibility of this answer. We had to check IMDb.com before we believed it. A person responding to this question might have said, “Tom Cruise the actor? Did you know that Cruise actually is his middle name? His real name is Thomas Cruise Mapother IV.” But Siri didn’t give us enough context to feel confident that we were actually talking about the same topic.
Jake felt that Siri works best if you talk to it using natural language (“When is Hugo playing?” as opposed to “Movie times Hugo”), but it still seems to have three areas where communication regularly breaks down, even in the brief time we were talking and testing it out:
- Sometimes it doesn’t properly identify the words you said. I tried asking it “When was Matt Damon born?” and it alternately interpreted the word “born” as “by” or “boren” (!?) On top of that, its general inability to understand non-US English speakers is already becoming a well-known issue. The voice recognition on my Android’s Google search was much more accurate, at least with phrases involving celebrity names.
- It may not interpret your question properly. Sometimes Siri understood the words I was saying, but still didn’t seem to understand the nature of the question.
- It may not have the information you seek, or be able to take the action on your query. It seems that its data sources are a lot more limited than I had originally anticipated. I was surprised how often an interaction ended with Siri offering to do a web search.
Jake summed it up like this: “When people first pick it up, they think they can ask anything they want and it will give them an answer. The reality is it only really becomes useful and satisfying when you start to ask it things that it’s capable of doing.” What does he use Siri for most? Setting a timer when he’s cooking and his hands are messy.
While this kind of functionality might not be quite ready for broad adoption, engineers will continue to improve it, and people will want to use it. But let’s not put voice recognition in every electronic device we own and have homes full of jabbering gadgets. Let’s think about the situations when it would be most useful for people to communicate with their devices verbally and focus on those experiences.
Machines that understand you
And that brings us back to Watson, the astonishingly successful Jeopardy!-playing computer. Let’s be clear—Watson’s amazing performance did not include voice recognition. It didn’t see or hear the clues; they were texted to the machine. What Watson did do was interpret a natural language query, parse through a tremendous amount of information to come up with a range of responses, and evaluate its confidence in the possible responses. All in a matter of roughly three seconds.
The confidence-evaluation part is a fascinating development. After coming up with many possible interpretations of a clue, Watson uses thousands of algorithms to find possible solutions. Then it assesses the likely accuracy of those responses based on how many algorithms pointed to each one. And it only presses the buzzer if the top answer surpasses a certain threshold of surety. But this isn’t the same as reasoning. Going back to the Kal Penn video example, Watson might have simultaneously come up with all the possible interpretations of the statement, but it wouldn’t have been able to progressively absorb information and adjust its understanding. This shortcoming yielded some humorously incorrect responses from Watson during the game; one of which resulted in Alex Trebeck scolding, “No, Ken said that.” Turns out that Watson wasn’t programmed to “listen” to and consider its competitors’ answers.
Since Watson was designed to succeed at playing a specific game, it has some skills that don’t necessarily carry over to other pursuits—for example: betting strategically, determining whether to buzz in, guessing where to find the Daily Double, and giving the answer in the form of a question. But its ability to analyze unstructured information and assign a confidence level to the answer—even when the question involved complex logic and enigmatic references—could be applied to many areas of knowledge and problem solving. And that’s what IBM seems to be planning. Add progressive reasoning, and Watson might really start to function like a human mind. But currently Watson’s hardware and software “brain” resides in 90 servers, processing massive amounts of data in parallel. So it’s going to be a while before Watson is available on your smart phone, telling you Tom Cruise’s middle name.
In the meantime, the lesson of credibility is an important one. As we learned from Siri, it’s not enough to just give users a free-floating response or a piece of content. If you conduct a query and receive a link to an untitled file that says “Download this” would you do it? In order for people to trust the information that they get from machines, we need to give them enough information to feel confident that what they’re getting is accurate and appropriate. It’s all part of the give and take of a conversation.
Areas for further inquiry
We need a combination of approaches if we want our machines to understand us better—to make the content smarter at the same time that we make our tools smarter. When we’re trying to communicate with machines, we can’t let the interaction end with “Let’s agree to disagree.” We need to at least get to “Let’s agree not to let communication break down even though I don’t understand you yet.”
As content strategists and content-first designers, we need to expand our horizons and look to other fields for inspiration and direction. We have a lot to learn from the study of cognitive psychology, developmental psychology, and linguistics (especially logic and semantics). We should also be attentive to developments in the practices of Human-Computer Interaction (HCI), Natural Language Processing (NLP), and Semantic Technology. There are a lot of different people tackling these problems from different angles, and if we want our content to be a dynamic part of the conversation, we need to get involved.
Content strategists think a lot about messaging from brands to people, people to people, or even people to brands, with digital devices as a conduit. But as people move into an increasingly “connected” lifestyle, perhaps the lesson of disambiguation is that we need to shift our focus from content to communication. Not just communication between people, through machines, but also from person to machine and from machine back to person.
Sometimes we’ll need to think about augmenting the content itself—data structure, data quality, and data sourcing. Other times we’ll need to think about the systems that touch our content and how it moves through them. We have to take the capabilities of these systems into account when employing content-first design. Or, perhaps, communication-first design: designing experiences that allow us to communicate with our devices as seamlessly as we communicate with each other.