Gemini 1.5 is Google’s next-gen AI model — and it’s already almost ready
Barely two months after launching Gemini, the large language model Google hopes will bring it to the top of the AI industry, the company is already announcing its successor. Google is launching Gemini 1.5 today and making it available to developers and enterprise users ahead of a full consumer rollout coming soon. The company has made clear that it is all in on Gemini as a business tool, a personal assistant, and everything in between, and it’s pushing hard on that plan.
There are a lot of improvements in Gemini 1.5: Gemini 1.5 Pro, the general-purpose model in Google’s system, is apparently on par with the high-end Gemini Ultra that the company only recently launched, and it bested Gemini 1.0 Pro on 87 percent of benchmark tests. It was made using an increasingly common technique known as “Mixture of Experts,” or MoE, which means it only runs part of the overall model when you send in a query, rather than processing the whole thing the whole time. (Here’s a good explainer on the subject.) That approach should make the model both faster for you to use and more efficient for Google to run.
But there’s one new thing in Gemini 1.5 that has the whole company, starting with CEO Sundar Pichai, especially excited: Gemini 1.5 has an enormous context window, which means it can handle much larger queries and look at much more information at once. That window is a whopping 1 million tokens, compared to 128,000 for OpenAI’s GPT-4 and 32,000 for the current Gemini Pro. Tokens are a tricky metric to understand (here’s a good breakdown), so Pichai makes it simpler: “It’s about 10 or 11 hours of video, tens of thousands of lines of code.” The context window means you can ask the AI bot about all of that content at once.
(Pichai also says Google’s researchers are testing a 10 million token context window — that’s, like, the whole series of Game of Thrones all at once.)
As he’s explaining this to me, Pichai notes offhandedly that you can fit the entire Lord of The Rings trilogy into that context window. This seems too specific, so I ask him: this has already happened, hasn’t it? Someone in Google is just checking to see if Gemini spots any continuity errors, trying to understand the complicated lineage of Middle-earth, and seeing if maybe AI can finally make sense of Tom Bombadil. “I’m sure it has happened,” Pichai says with a laugh, “or will happen — one of the two.”
Pichai also thinks the larger context window will be hugely useful for businesses. “This allows use cases where you can add a lot of personal context and information at the moment of the query,” he says. “Think of it as we have dramatically expanded the query window.” He imagines filmmakers might upload their entire movie and ask Gemini what reviewers might say; he sees companies using Gemini to look over masses of financial records. “I view it as one of the bigger breakthroughs we have done,” he says.
For now, Gemini 1.5 will only be available to business users and developers, through Google’s Vertex AI and AI Studio. Eventually, it will replace Gemini 1.0, and the standard version of Gemini Pro — the one available to everyone at gemini.google.com and in the company’s apps — will be 1.5 Pro with a 128,000-token context window. You’ll have to pay extra to get to the million. Google’s also testing the model’s safety and ethical boundaries, particularly regarding the newly larger context window.
Google is in a breakneck race to build the best AI tool right now, as businesses around the world try to figure out their own AI strategy — and whether to sign their developer agreements with OpenAI, Google, or someone else. Just this week, OpenAI announced “memory” for ChatGPT, and it appears to be readying for a push into web search. So far, Gemini seems to be impressive, especially for those already in Google’s ecosystem, but there’s a lot of work left to do on all sides.
Eventually, Pichai tells me, all these 1.0s and 1.5s and Pros and Ultras and corporate battles won’t really matter to users. “People will just be consuming the experiences,” he says. “It’s like using a smartphone without always paying attention to the processor underneath.” But at this moment, he says, we’re still in the phase where everyone knows the chip inside their phone, because it matters. “The underlying technology is shifting so fast,” he says. “People do care.”
https://www.theverge.com/2024/2/15/24073457/google-gemini-1-5-ai-model-llm