Apple Announces New Ferret-UI LLM - This AI Can Read iPhone Screens

Apple Announces New Ferret-UI LLM - This AI Can Read iPhone Screens

Apple researchers have created an AI model that can understand what is happening on a cell phone screen This is the latest in a growing line of models

Called Ferret-UI, this multimodal large-scale language model (MLLM) can perform a wide variety of tasks based on what you can see on your phone's screen Apple's new model can, for example, identify the type of icon, look for specific text, or tell you exactly what to do to accomplish a particular task

These capabilities are documented in a recently published paper that details how this particular MLLM was designed to understand and interact with mobile user interface (UI) screens

What is not yet known is whether this will be part of the rumored Siri 20 or just an Apple AI research project that will remain a paper publication

We currently use our cell phones to accomplish a variety of tasks, such as looking up information and making appointments To do so, we look at our phones and tap a button that leads us to our goal

Apple believes that automating this process would make interacting with cell phones even easier They also hope that models like Ferret-UI will help with accessibility, app testing, usability testing, etc

For such a model to be useful, Apple needed to be able to understand everything that is happening on the phone's screen while being able to focus on specific UI elements Overall, Apple also needed to be able to match the instructions given in normal language with what is being displayed on the screen

For example, Ferret-UI was shown a picture of AirPods in an Apple store and asked how to purchase them; Ferret-UI correctly replied that he should tap the "Buy" button

With most of us carrying smartphones in our pockets, it makes sense that companies are looking at ways to add AI features tailored to these smaller devices

Research scientists at Meta Reality Labs have already predicted that we will spend an hour or more each day in direct conversation with chatbots or having LLM processes run in the background to power features such as recommendations

Meta

Meta's chief AI scientist, Yang Le Kung, has even said that in the future, AI assistants will mediate our entire digital diet

So while Apple did not specify what exactly its plans for Ferret-UI are, it is not that difficult to imagine how such a model could be used to supercharge Siri and make the iPhone experience pleasant, perhaps even within the year It's not hard to see

Categories