Google Cloud Vertex AI Upgrades Include Gemini 1.5 Flash General Availability, Imagen 3 Preview

Solution providers ‘are building large business opportunities and businesses because of the demand that we see for this technology from lots of companies in different parts of the world,’ says Google Cloud CEO Thomas Kurian.

Google Cloud has made a series of upgrades to its Gemini generative artificial intelligence offering, including moving Gemini 1.5 Flash and Pro to general availability and launching a preview of Imagen 3 and a public preview for a context caching feature.

The Mountain View, Calif.-based tech giant revealed the enhancements in a Thursday blog post about its Vertex AI machine learning platform as the battle for GenAI supremacy heats up with competitors including Microsoft.

During a virtual press conference, when asked by CRN about Google Cloud’s services-led partners, CEO Thomas Kurian (pictured) said that systems integrators and other solution providers “are building large business opportunities and businesses because of the demand that we see for this technology from lots of companies in different parts of the world.”

Google Cloud Updates Vertex AI

“We’ve always said that what we’re offering with Vertex and our models is a platform for organizations to build applications with,” Kurian said. “Many companies want solutions. And solutions are using our models to improve customer service, using our models to improve their internal processes, using the model to change how they work with suppliers.”

Among the upgrades Google unveiled is the general availability of Gemini 1.5 Flash and Gemini 1.5 Pro.

The vendor positions this AI model as giving users lower latency, more competitive pricing and a 1 million-token context window.

The tech giant positions Gemini 1.5 Flash as useful for scaling AI for retail chat agents, document processing, research agents that can synthesize entire repositories and other use cases.

Gemini 1.5 Pro is now available with a window of up to 2 million tokens, according to Google Cloud. For comparison, processing six minutes of video can take more than 100,000 tokens and large code bases can take more than 1 million.

The 1.5 Pro model can find bugs across many lines of code, find information across libraries of research and analyze hours of audio and video, according to Google Cloud.

Along with this news, Google Cloud said that it has moved the Imagen 3 image generation foundation model to preview for Vertex AI users with early access.

Imagen 3 promises 40 percent faster generation over its predecessor and improved prompt understanding, instruction-following, photo-realistic generations of groups of people and control test rendering within an image, according to Google Cloud.

The model also has multi-language support, multiple aspect ratios support and Google DeepMind’s SynthID digital watermarking with other built-in safety features.

Google Cloud has released its Gemma 2 lightweight, open model globally for researchers and developers, according to the vendor. In July, Vertex AI users can access Gemma 2.

The model is available in 9-billion and 27-billion parameter sizes and is more powerful and efficient than the prior generation, according to Google Cloud.

Google Cloud has started rolling out a context caching feature in public preview for Gemini 1.5 Pro and Gemini 1.5 Flash users.

Context Caching is meant to reduce input costs and leverage cached data of frequently used context, potentially simplifying production deployment for long-context applications.

Google Cloud has made provisioned throughput generally available with allowlist, giving users the ability to scale first-party Google model use. Provisioned throughput also promises predictability and reliability to production workloads, according to Google Cloud.

Next quarter, Vertex AI will offer a service to ground AI agents with specialized third-party data, potentially reducing incorrect results. The service should become available next quarter, Kurian said on the call.

Grounding with high-fidelity is now in experimental preview, giving users in data-intensive industries such as financial services, health care and insurance the ability to generate responses sourced only from provided context, not the model’s world knowledge.

This feature should help with summarizing multiple documents, extracting data against a set corpus of financial data, or processing across a predefined set of documents, according to Google Cloud. The high-fidelity mode is powered by a fine-tuned version of Gemini 1.5 Flash.

Google Cloud is at work deepening its partnership with AI company Mistral, promising to add Mistral Small, Mistral Large and Mistral Codestral to Vertex AI Model Garden during the summer.

And the vendor revealed that it is expanding its machine learning processing commitments to eight more countries, with four of those countries in 2024. Google Cloud did not say what those countries are. Today, Google Cloud has data residency for data stored at-rest guarantees in 23 countries, including Qatar, Australia, Spain, Israel and India.