Google’s Gemini 1.5 Pro can now hear

Google’s update to Gemini 1.5 Pro gives the model ears. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.

During its Google Next event, Google also announced it’ll make Gemini 1.5 Pro available to the public for the first time through its platform to build AI applications, Vertex AI. Gemini 1.5 Pro was first announced in February.

This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already surpasses the biggest and most powerful model, Gemini Ultra, in performance. Gemini 1.5 Pro can understand complicated instructions and eliminates the need to fine-tune models, Google claims.

Gemini 1.5 Pro is not available to people without access to Vertex AI. Right now, most people encounter Gemini language models through the Gemini chatbot. Gemini Ultra powers the Gemini Advanced chatbot, and while it is powerful and also able to understand long commands, it’s not as fast as Gemini 1.5 Pro.

Gemini 1.5 Pro is not the only large AI model from Google getting an update. Imagen 2, the text-to-image generation model that helps power Gemini’s image-generation capabilities, will also add inpainting and outpainting, which let users add or remove elements from images. Google also made its SynthID digital watermarking feature available on all pictures created through Imagen models. SynthID adds an invisible to the viewer watermark on images that marks its provenance when viewed through a detection tool.

Google says it’s also publicly previewing a way to ground its AI responses with Google Search so they answer with up-to-date information. That’s not always a given with the responses produced by large language models, sometimes intentionally; Google has intentionally kept Gemini from answering questions related to the 2024 US election.

https://www.theverge.com/2024/4/9/24124741/google-gemini-pro-imagen-updates-vertex