Apple’s AI ambitions have painted a confusing picture of what the company has planned for its AI debut, with the most recent update on Apple Siri’s AI capabilities adding some pizzazz to the mix. For months now, the Cupertino company has slowly revealed its AI investigations through research papers that have hinted at the various applications of machine learning for its product line. Now, internal tests have revealed that the upcoming Apple AI outperforms GPT-4, the current reigning champion of AI technology. Recent sources have pointed at Apple outsourcing its AI burden to Google and Baidu but perhaps there’s still hope for a real challenge between Apple’s on-device AI vs GPT-4. 

Apple AI outperforms GPT-4

Image from WWDC 2018 Keynote

Are Apple’s Siri AI Capabilities Going to Be Powered In-House?

Early in February, we saw Apple release an open-source edition of an AI imaging tool that could help users experience instruction-based editing to change their photos in any way they desired. The research paper was published at the International Conference on Learning Representations (ICLR) 2024. In March, another research paper on the building of Multimodal Large Language Models (MLLMs) was published on arXiv, a platform that allows open access to published research papers. 

The paper introduced us to the MM1, “a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks.” It was clear that Apple was investing heavily in the building of its own AI systems and this belief was reaffirmed by CEO Tim Cook. Despite these clear indications that Apple was cooking up a digital storm, we also saw reports of the company approaching Google for a Gemini AI integration in Apple devices. These reports were followed up by Apple’s conversation with Baidu for AI integration in the Chinese market.

This could be cause for confusion but perhaps the iPhone maker is working on separate segments of AI in its upcoming devices. While external AI platforms power additional features of the iPhone and its adjacent products, Apple’s Siri AI capabilities might be powered in-house. The recent paper publication on the ReaLM model does suggest the company has made big strides to ensure the Apple AI outperforms GPT-4.

Apple On-device AI vs GPT-4: What’s New With Apple?

As an introduction to Apple’s Siri AI capabilities, the company released another research paper introducing the Reference Resolution As Language Modeling (ReALM) system. The model focuses on improving Siri’s understanding of conversations, prompts, and real-world cues to respond to your needs more intuitively. Apple’s on-device AI model focuses on contextual understanding and non-conversational entities to pick up on a wider range of references than just text prompts. More opportunities to listen in on your conversations and read your data? We love to see it.

The tasks that ReALM performs are divided into three categories—on-screen entities, conversational entities, and background entities. What that means is that the model utilizes information available to it on the screen of your devices. It also references the user’s queries and consolidates data that is running in the background of your device. The Apple on-device AI model’s contextual understanding lies at the center of its capabilities, breaking down human language to the level of a casual communicator—not a communicator with a deep understanding of prompt engineering. 

The iOS assistant is already quite adept at understanding what a user is looking for, but Apple Siri’s AI capabilities could be exceptionally enhanced by this system to support its functioning.

Apple AI Outperforms GPT-4

In the process of developing the ReALM model, testing of the Apple on-device AI vs GPT-4 revealed that the new AI could perhaps outperform GPT-4. The company found their AI to perform better in benchmark tests against GPT-3.5, the force behind the free chatbot, and GPT-4, which powers the more advanced ChatGPT Plus. Their smallest models performed as well as GPT-4 while the larger models exceeded OpenAI’s model. 

The size mentioned refers to the different versions of the ReALM, which are supported by a different number of parameters each. There are four sizes indicated by the paper—ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B—where the “M” stands for millions and the “B” refers to billions. GPT-4 has around 1.5 trillion parameters and it’s a matter of pride for the researchers to see the Apple AI outperform GPT-4 despite having fewer parameters.

Apple on-device AI vs GPT-4 has only been tested internally and there is no evidence of peer reviews of the paper so we currently have to rely on the company’s word, but it is unlikely they would have made the claim if they were unable to back it up with proof. At least we’d like to think so. Siri’s AI ability to take on ChatGPT will have to be put to the test in a real-world setting to fully understand how the assistant benefits from being backed by AI and whether there is truly any point in Apple embracing artificial intelligence. 

For an everyday user who will be the one to test out Apple’s Siri and its AI capabilities, the research metrics will mean little until it actually improves the quality of the assistant and adds features that will enhance the user experience. To understand the full scope of its functionality, we’ll likely have to wait till the Apple event, WWDC 2024, scheduled in June.