![Gemini 1.5 Pro](https://allaboutaitech.com/wp-content/uploads/2024/02/Gemini-1.5-Pro.jpg)
Google introduced the next version of their AI model, the Gemini 1.5 Pro, with a much larger context window as compared to the Gemini 1.0 Pro and other AI models. The Gemini 1.5 Pro uses the popular Mixture-of-Experts (MoE) approach for more efficient training and higher-quality responses.
The Gemini 1.5 Pro is much better than the Gemini 1.0 Pro in text, vision, and audio benchmarks. It gives good competition to Google’s best AI model, Gemini Ultra 1.0. The Gemini 1.5 Pro also uses less computation power as compared to the Gemini Ultra 1.0.
Google Gemini 1.5 Pro with a 1 Million token Context Window
Google’s CEO, Sundar Pichai, took to social media platform X and said the Gemini 1.5 Pro will soon roll out with a 128K-token context window.
Pichai also said that enterprise users and developers can try the Gemini 1.5 Pro with a whopping experimental 1 million token context window.
This large context window easily beats Gemini 1.0, GPT-4 Turbo, and Claude 2.1, which come with 32K, 128K, and 200K context windows, respectively.
With the 1 million token context window, enterprise users and developers can upload entire PDFs, entire code repos, and long videos. It can take 11 hours of audio, 1 hour of video, and 7,00,000 words at once in a single context. However, Google said users can expect a longer latency time as it is still an experimental feature of the model.
![Gemini 1.5 Pro Comparison with other AI Models](https://allaboutaitech.com/wp-content/uploads/2024/02/Gemini-1.5-Pro-Comparison-with-other-AI-Models.jpg)
Google Gemini 1.5 Pro with a 10 million token Context Window
Google also did research in which Gemini 1.5 was tested on 10 million tokens for text, 2 million tokens for audio, and 2.8 million tokens for video, and the model was able to remember these tokens with great accuracy.
This will easily beat every other model currently present in the AI industry.
Capabilities of the Gemini 1.5 Pro
Google also gave us a demo showing the capabilities of their new Gemini 1.5 model. In the demo video, a 402-page PDF transcript of the Appollo 11 moon landings was uploaded to Google Visual Studio. The model was asked to find three comedic moments and list quotes from the transcript and emojis. The Gemini Pro 1.5 was able to extract three quotes and comedic moments accurately from the PDF.
Further, it was given a drawing showing a man’s foot, and the model was asked, “What moment is this?” The model correctly identified it as Neil Armstrong’s first step on the moon, which shows the multimodal capabilities of the model.
When it was further asked to cite the timecode of this moment in the transcript, the model was able to cite the timecode accurately.
Google also uploaded two more demo videos in which the Gemini 1.5 Pro model was able to process more than 100K lines of code in Three.js. It was also able to process 44 minutes of the Buster Keaton film, find small details in the film, and understand plot points.
How to access Gemini 1.5 Pro
Google said that the Gemini 1.5 Pro with a 1 million token context window can currently only be accessed by developers and enterprise users.
Developers interested in testing 1.5 Pro can sign up and access it via Google AI Studio, while enterprise users can access it via their Vertex AI account team.
Conclusion
Google is introducing new AI technologies and features day by day. It seems that they are very serious and want to be ahead of everyone in the AI race. With a 1 million-token context window, users can do many tasks they never thought of, such as uploading full movies to get reviews or uploading full code files to know which function does what in a program. Google has also researched Gemini 1.5 with a 10 million token context window, and it worked with great accuracy. If Google is able to pull this off, then no other AI model will be able to compete with Gemini.
Competitors like OpenAI will have to work hard to stay relevant and compete with Google. The battle between OpenAI and Google is increasing day by day, as OpenAI has also announced their text-to-video generator, which is directly going to compete with Google’s Lumiere. It is fun to see the tech giants compete in this AI race.
Read More
OpenAI’s Sora: A Groundbreaking New Text-to-Video AI Generator Announced
0 Comments