When the going gets tough, Apple’s Siri goes pro

Every time I ask Siri to play some music and it thinks I want it to switch on the lights, I realize how applied artificial intelligence (AI) is nothing without some understanding of context. Apple’s AI dev teams know this, too. And they have a plan.

You see, the latest slice of the ever rolling deluge of AI-related news is that Apple researchers have published fresh research that shows an AI tool more contextually aware than Open AI’s ChatGPT. It’s all about Reference Resolution As Language Modeling (ReALM).

What does it do?

Among other things, this contextual understanding seems to have been built to facilitate even better human-device communications.

The paper specifically talks about the machine being able to tell which app a user is referring to — you might be checking an important PDF in Mail but want the music changed in your Music app. The research suggests a contextual understanding that can figure out what you are doing, even if it isn’t the main thing you are doing at any moment.

There are obvious uses to boost accessibility: You should be able to point at something on the screen to find out more about that object, for example.

The team also looked at conversational understanding and comprehension of ongoing background tasks. For example, ReALM should be able to understand when you respond to a notification you just received, even if you are doing something else at the time.

What is Reference Resolution?

Apple’s research is included in a paper published on Arxiv.org that looks at “Reference Resolution.” According to one respected guide, reference resolution is a way to express the problem a computer (AI) has to “Find out which object is referred to by an expression, thus gradually building a representation of the objects with their features and evolution.”

In other words, the computer must aim to be as effective as human communication and understanding, such as when we use words like “they” or “those” and the person we are speaking with contextually understands what we are trying to say.

The paper offers an example in which someone might ask ReaLM to show nearby pharmacies. The tech presents the list, and the person could say something vague such as “Call the bottom one,” or “Call this number” (if the number is on screen). Existing virtual assistants would struggle with this, but the researchers’ own tech handles these tasks. They even claim their invention can “substantially outperform” Chat GPT4 in some ways, while matching its performance in others.

When the going gets tough, Siri goes pro

“Critically, we demonstrate how entities that are present on the screen can be passed into an LLM [large language model] using a novel textual representation that effectively summarizes the user’s screen while retaining relative spatial positions of these entities,” they wrote.

In other words, you can anticipate highly effective spoken word control of what’s on screen, perhaps augmented by Apple’s existing Voice Over UI — with obvious implications for its visionOS product line.

This is just one of the many nuggets of information to emerge from Apple’s AI development teams as the company prepares to (hopefully) wow developers at WWDC 24. These many clues also describe technology to support task-focused AI at the edge; superior image intelligence; partnerships with LLM providers such as Google Gemini; augmentation to the company’s existing apps, including Xcode; and more.

Apple has also reportedly acquired interesting AI companies in a tech-question mode, including Darwin AI and brighter AI.

When can we expect it?

While we can’t be sure the extent to which all these promises will translate into shipping products in such a short time frame, we can expect the first chapters in this part of Apple’s AI story to open up later this year.

Apple CEO Tim Cook promised this when he explained how he does, “look forward to sharing with you the ways we will break new ground in generative AI, another technology we believe can redefine the future.”

The accumulation of evidence, published reports, and promises coming from within Apple suggest just how seriously the company takes this push into advanced artificial intelligence.

Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe.

Apple, Artificial Intelligence, Chatbots, Generative AI, Machine Learning