Google flashes everyone — The new Gemini Flash15 takes GPT-4o

Google flashes everyone — The new Gemini Flash15 takes GPT-4o

Google has launched a new member of the Gemini family of artificial intelligence models Sitting between the on-device Nano and cloud-based Pro, Gemini Flash designed for complex tasks that require chat, fast response and processing images, video and voice Gemini Flash15 is a native multimodal model similar to OpenAI's recently announced GPT-4o, announced at the google I/O developer event, and built for speed It is also useful for real-time conversations

As the new model is currently available worldwide for developers to use in their own applications, we could soon see a number of third-party live chat apps built using Gemini Flash15 

We also saw news that it would power the Gemini Advanced premium chatbot, along with an upgrade to the Gemini Pro15, a model that was first released earlier this year

Gemini Flash15 is just above the Nano in the size hierarchy and just below the Pro, and the difference from other AI models, as well as its siblings, is the combination of speed and agility

In addition to being fast and impressive in its ability to understand text, images, video and audio, Flash15 is cheaper and at least 20 times cheaper compared to the more expensive Pro "We know from user feedback that some applications require lower latency and lower cost," said Demis Hassabis, CEO of Google DeepMind "This has inspired us to continue to innovate," he added, announcing Flash as "a model that is lighter than the 15Pro and designed quickly and efficiently to deliver at scale""It's a good comparison with OpenAI's recently announced GPT-4o model, at least in terms of speed It is very fast, natively multimodal and designed for real-time interaction That said, Gemini Flash15 seems to be a less capable model in terms of reasoning

Like the other Gemini family models, Flash15 comes with a massive one million token context window, which promises to be fully available in practice In comparison, GPT-4o has 128,000 token content windows, while Claude3 has 200,000 tokens

What makes large context windows so important is the ability to hold large amounts of information in memory within a single conversation This is essential when analyzing non-text content, as images are worth 1,000 words and videos are worth even more He was also trained by his older brother Gemini Pro15

Hassabis said this was done "through a process called distillation," with the most important knowledge and skills coming from larger models being smaller and more efficient models

"15Flash is used to create abstracts, chat applications, image and video captions, data extraction from long documents and tables, etc "It is very good to have a good time," he said

These models become even more important as they gain the ability to understand more as well as text with increased context windows to include faster but smaller ones like Flash

Categories