During Build2024, Microsoft announced a new version of the company's small language AI model Phi-3It can analyze the image and tell the user what is in it The new version of Phi-3-vision is a multimodal model
Especially with the update to OpenAI's GPT-4o and Google's Gemini, the multimodal model means that AI tools can read text and images
Phi-3-vision is intended for use on mobile devices because it has a 42 billion parameter model The parameters of an AI model are shorthand for understanding how complex a model is and how well trained it can be understood Microsoft has iterated over the Phi model in previous versions So, for example, Phi-2 learns from Phi-1 and grows with new features, while Phi-3 is similar to Phi-2, trained on Phi-2 and with added features
Phi-3-vision can perform common visual inference tasks, such as analyzing charts and images Unlike other well-known models like OpenAI's DALL-E, Phi-3-vision can only "read" images and not generate them
Microsoft has released some of these small AI models It's designed to run locally and on a wider range of devices than larger models such as Google's Gemini and ChatGPT No internet connection is required It also reduces the computational power required to perform certain tasks, such as solving math problems, like Microsoft's small Orca-Math model
The first iteration of Phi-3 was announced on May 4, when Microsoft released the tiny Phi-3-mini In the benchmark test, it worked very well for larger models like Llama2 in Meta The mini-model has only 38 billion parameters There are also two other models, Phi-3-small and Phi-3-medium, which have 70 billion parameters and 140 billion parameters, respectively
Phi-3-vision is currently available in preview The other 3 Phi-3 models, Phi-3-mini, Phi-3-small, and Phi-3-medium are accessible from the Azure Machine Learning Model catalog and collection To use these, you need a paid Azure account and Azure AI Studio hub
Comments