How Siri on iPhone 4S works and why it’s a big deal. Apple’s AI tech details in 230 pages of patent app
Apple’s famous “one more thing” during iPhone 4S presentation last week came in the form of Siri.
It’s an “Intelligent Personal Assistant” that understands what you are telling it to do and can perform certain tasks. E.g. reserve a table at your favorite Italian restaurant, reply to SMS, set a calendar appointment, tell you whether it will rain tomorrow, or figure out the distance to the moon.
But the opinion about Siri remains divided. There is a majority of those whose see just nice voice control and speech recognition gimmicks of Siri and think “Meh”. “’Seen that already, many times. Maybe Apple’s stuff is nicer, neater, does a bit more and is interesting in some limited cases. But still, meh.” And then there are those who know a bit more about the origins, history and the insides of Siri, who think that it is a world changing technology, on par with Mouse and GUI.
So who is right?
The problem is, that beyond some hints, nobody wants to share how Siri works, for the rest of us to make up our minds.
There is talk about interfacing with APIs from various web services, some bad ass AI engine with DARPA/Pentagon origins, that ties things together. There is a cool voice recognition technology from Nuance powering it. And that’s about it. For all that Apple chose to reveal about Siri, they were very tight lipped about an actual technology underneath, and what makes Siri different from the failed speech/computing interfaces of the past.
Let’s fix that right now. With the help of the 230 page patent application, coming straight from Apple’s R&D labs, that thoroughly describes all the technology behind Siri Intelligent Personal Assistant, and shows us why it is such a big deal.
The key difference between Siri and other general AI efforts is that Siri is humble. It knows that general human understanding/intelligence is very hard for computers, and it does not try to do that. What Siri does is – narrow the needed understanding to very specific areas– e.g dining/restaurants, sports events, movies/enternatinment, travel, weather, etc;. And then it fills each of these areas with special vocabulary, databases, web services, rules of interaction and machine readable description of how all parts fit and interact together. They call these specialized areas – Active Ontologies.
E.g. Restauarant/dining active Onthology can include one or several restaurant databases, a number of restaurant review services like Yelp and Zagat, accessed via API, a special dining related vocabulary database, a model of actions that people usually perform when they they decide on the next dinner, an access to reservation service like Open Table and the rules for automatically making a reservation through it and entering the reservation to user’s calendar, specially formatted dialogs related to the restaurant choosing and reservation process, etc;.
After user request passes through the language recognition/interpretation module, with the help of relevant active ontology Siri tries to figure out user intent. After it does that, the intent is routed to the “Service orchestration component” (SOC). This component figures out out what external services can be used to fulfill the request, and translates it into a commands that these services can understand, collets the information, sorts it out for the user and performs required actions.
E.g. to answer the question about good Italian restaurants nearby – SOC can get a list of Italian restaurants with addresses in the city from a business directory, with the help of geolocation service pick those that are within 2 miles of current location, check reviews/ratings on Yelp and Zagat and present user with the list of best rated Italian restaurants nearby. Then, after user made the selection, and asked to do it, SOC will call Opetan Table service API and reserve the table, enter the dinner appointment into a calendar via calendar service API, reformat the relevant data for a taxi service API and order the taxi.
The Service Orchestration Component has its own tools for mapping various external service APIs to the actions to be performed by SIRI. Active Ontologies also have their own set of tools to map different domains and areas of human activity. Both of them allow to plug in additional relevant modules, services and databases as they become available. Thus making Siri infinitely expandable and flexible.
While Siri is currently available only on iPhone 4S, this figure from patent application shows that Apple has much bigger ambitions for it:
Apple sees its AI technology not just as a thing to play with on your phone. In a few years we may be talking to Siri in our cars, Macs, Web browsers, home appliances, TVs, stereos and many other things.
As for the topics that Siri may cover in the future – here are just a few examples that Apple lists:
Local Services (including location- and time-specific services such as restaurants, movies, automated teller machines (ATMs), events, and places to meet); personal and Social Memory Services (including action items, notes, calendar events, shared links, and the like); E-commerce (including online purchases of items such as books, DVDs, music, and the like); travel Services (including flights, hotels, attractions, and the like), navigation (maps and directions); database lookup (such as finding businesses or people by name or other properties); getting weather conditions and forecasts, checking the price of market items or status of financial transactions; monitoring traffic or the status of flights; accessing and updating calendars and schedules; managing reminders, alerts, tasks and projects; communicating over email or other messaging platforms; and operating devices locally or remotely (e.g., dialing telephones, controlling light and temperature, controlling home security devices, playing music or video, etc.)
For all the nice stuff that we’ve seen Siri perform on stage, for now it is extremely limited. No wonder Apple decided to slap a beta label on it. But the original iPhone was also a very limited device. And, arguably, iPhone did not really catch up to all the capabilities of modern smartphones until iPhone 4 came out. Even if the revolutionary elements of graphical/multi-touch UI were easier to grasp at once.
Just as with the first iPhone, each successive iOs and OSX release will have a better, more encompassing version of Siri, making inroads into ever wider collection of devices.
Also, just like the first iPhone – Siri is a closed system for now, with all domain mappings and service additions done inside Apple, or in close cooperation with select companies. But it does not have to be this way forever. As soon as the infrastructure technology behind Siri matures enough – Apple can open its APIs to outside developers ushering an explosion of new kind of Siri based apps.
So, yes, from the looks of it – Siri is a really big deal. And speech recognition we are so (not)impressed with now, is only one of the smaller, interchangeable parts of it.
Plus, Apple is not the only one trying to add some intelligence to our mobile devices. And now, that Apple showed the way once again, Google, Microsoft and Nokia can get busy doing the same for their devices too.
Looks like the era of a new breed of intelligent computing is almost upon us.
I was able to give only a very general description of Siri technology here, omitting a lot of important parts. If you want to learn more about Siri, you can download full Apple patent application here (*.pdf, 10MB).