OpenAI is taking on the voice assistant market with its new Voice Engine tool and may have Apple, Amazon and Google in its sights for the next big artificial intelligence push
ChatGPT has a voice-friendly interface on mobile and recently introduced a way to speak responses on desktop, but OpenAI's new trademark application for the term Voice Engine is specifically related to building digital voice assistants
Android now allows users to replace the default voice assistant Apple appears to be in discussions with various AI companies about the future of artificial intelligence in the iPhone, and this may be a preemptive move by OpenAI to build a potential new market
Apple is also rumored to be opening an AI-only App Store with the next major upgrade of iOS, which would create a new market for AI-powered assistants
OpenAI CEO Sam Altman said there are "a variety of things" that will be released this year This is expected to include Sora, an AI video tool, but could also include a new AI voice system
We don't know much about Voice Engine or even if it will be a product; OpenAI has not commented publicly on it, so all we have are rumors and trademark applications
VoiceEngine may be a new model created specifically for voice applications, but it could also be part of OpenAI's enterprise play It has the potential to build a high-quality voice system that will allow companies to build more efficient call center bots
A new trademark application was filed last week with the US Patent and Trademark Office While the filing does not necessarily imply commercialization, it is consistent with the broader market shift toward voice and the direction of OpenAI's target model
The application covers building digital voice assistants, generating speech from text prompts, processing voice commands, and creating software used to provide voice services
The complete application covers the development of voice service provision, the use of AI for text or speech, text to speech, natural language, and speech processing, the generation of speech and voice from prompts (text, audio, visual, images), voice command processing, speech recognition, and building digital voice assistants Coverage
This sounds like all the pieces needed for a fully functional, fully interactive AI voice assistant that can handle complex tasks, chat naturally, and even take phone calls for you
OpenAI released GPT-4 a year ago At the time, this was the groundbreaking generative AI model that powered ChatGPT and Microsoft Copilot
The company also began training GPT-5 late last year, sparking speculation about its release date Altman told podcaster Rex Fridman that the company will "release an amazing new model this year," but did not confirm whether this is the GPT-5 or a preliminary step
He also said that "various things" would be released in the coming months; according to OpenAI CTO Mira Murati, this includes the AI video platform Sora
Some on social media have speculated that Sora and this new Voice Engine are different modal interfaces for GPT-5
It is very likely that GPT-5 will be a true multimodal model, able to understand video, images, voice, text, and code, as well as generate all these types of content
Given the trademark description, Voice Engine could also be the new voice assistant, merging the broad capabilities of Siri, Alexa, or Google Assistant with ChatGPT's reasoning and natural language capabilities
Google has already begun upgrading Gemini to work that way, Apple is rumored to be building a new version of Siri with extensive language modeling capabilities, and Amazon is already testing Alexa Plus with similar underlying skills The two companies have been working on a new version of Siri with a large language modeling capability
OpenAI might offer a Voice Engine to run such a system in the future, or as an alternative interface to ChatGPT that works with smart speakers, phones, or headphones
Or it could just be that OpenAI is being cautious with its trademarks: its bid to protect GPT was rejected, and it is now applying for trademarks GPT-5, 6, and even GPT-7 The latter includes the generation of music, the conversion of text and data into code, and the creation of code from scratch
Comments