Google's open source AI model Gemma is now available for the Groq chatbot; Gemma is a much smaller language model than Gemini or OpenAI's ChatGPT, but it can be installed anywhere, including laptops, and it can run as fast as Groq's Nothing runs faster than the language processing unit (LPU) chip that powers the interface
In a quick test, we asked Gemma on Groq to take us around her home planet like an alien tour guide, describing some of the more exciting sights and attractions
She responded at an astounding rate of 679 tokens per second, and the entire well-constructed and imaginative scenario appeared before me faster than I could read my own prompts
There is a growing trend toward smaller open-source AI models that are not as capable as their bigger brothers, but still work well and are small enough to run on laptops and cell phones
Gemma is Google's answer to this growing trend; trained in a similar fashion to Gemini, it is a large language model with 2 billion and 7 billion parameter versions
In addition to running on laptops, it can be run in the cloud with services like Groq, or integrated into commercial apps to bring LLM functionality to products
Google says it will extend the Gemma family over time, and higher-performance versions may be forthcoming Being open source means that other developers can build on top of this model, tweak it with their own data, and adapt it to work in different ways
Groq is both a chatbot platform with multiple open source AI models to choose from and a company that is producing a new kind of chip specifically designed to run AI models quickly
"We've been laser-focused on delivering unmatched inference speed and low latency," explained Mark Heap, Groq's chief evangelist, in a conversation with Tom's Guide
"This is critical in a world where generative AI applications are becoming ubiquitous"Designed by Jonathan Ross, founder and CEO of Groq, who also led the development of Google's Tensor Processing Unit (TPU) used to train and run Gemini, the chip is designed for rapid scalability and efficient data designed for rapid scalability and efficient flow of data through the chip
To compare the execution speed of Gemma on Groq and Gemma on a laptop, I installed the AI model on an M2 MacBook Air and ran it through Ollama, an open source tool that allows AI to be easily run offline
I gave the same prompt: "Imagine you are an alien tour guide, showing human visitors around your home planet for the first time Describe some of the most fascinating and unusual sights, sounds, creatures, and experiences you would share with them during the tour Feel free to be creative and include vivid details about the alien world
Five minutes later, it wrote four words This is probably due to the fact that my MacBook has only 8 GB of RAM, but other models like StabilityAI's Zephyr and Microsoft's Phi-2 work fine
Compared to Gemma's other cloud installations, the Groq installation is surprisingly fast It outperforms ChatGPT, Claude 3, and Gemini in response time, which on the surface seems unnecessary, but imagine if an AI was given a voice
The response time is too fast for a human to read in real time, but when connected to an equally fast text-to-speech engine like ElevenLabs running on the Groq chip, it could not only respond in real time, it could even rethink and adapt the interruptions that create natural conversation It would even be able to rethink and adapt to the interruptions that create natural conversations
Developers can also access Gemma through Google Cloud's own Vertex AI, which can integrate LLM into their apps and products through an API This functionality is also available through Groq and can be integrated and downloaded for offline use
Comments