Top 10 AI Foundation Models Ranked: Google, Nvidia, OpenAI Lead Forrester Report

The report scores the world’s top 10 AI foundation models for language, including Google Gemini, Anthropic Claude, Amazon Bedrock, IBM Granite and OpenAI GPT-4.

Forrester has reviewed, scored and ranked the world’s top AI foundation models for language—from Amazon Bedrock and Google Gemini to OpenAI GPT-4 and Anthropic Claude.

AI startups like Cohere and Mistral AI go head-to-head against global tech giants like IBM, Microsoft and Nvidia in Forrester’s new report, “AI Foundation Models For Language, Q2 2024.”

“The GenAI zeitgeist propelled AI foundation models for language to the forefront of technology and business leaders’ minds,” said the IT market research firm in its 2024 AI report. “The market for AI foundation models can be one of the most inscrutable markets for buyers due to the absurd rate of innovation and choices between hot startups and tech giants.”

Forrester ranked the top 10 AI foundation models (FMs) language providers that matter most on a global basis and how they stack up against each other.

[Related: Microsoft-Backed Mistral AI Startup Raises $640M; Hits $6B Valuation]

Forrester’s AI Foundation Model Ranking System

Forrester evaluated 10 vendors’ FM offerings in three categories: the AI FM product itself, the company’s strategy, and overall market presence.

For each of these three categories, Forrester scored vendor’s AI FM offering on a one to five scale, with one meaning “weak” and five equaling “strong.” The higher the score, the better the FM product and company strategy.

Each company’s AI model offering was ranked based off many factors including core capabilities, code generation, governance and security, model management, resilience and scalability, context window and overall scope. Companies’ strategy score was based around things such as vision, partner ecosystem and pricing flexibility, while market presence was scored based on revenue and number of customers.

Here are the 10 models and companies Forrester ranked in its new “Forrester Wave: AI Foundation Models For Language in Q2 2024”:

AWS’ Amazon Titan

Anthropic Claude

Cohere Command

Databricks DBRX

Google Gemini

IBM Granite

Microsoft Phi

Mistral AI

Nvidia Nemotron

OpenAI GPT-4

CRN breaks down Forrester’s historic report ranking the world’s 10 best AI foundation models for language.

Leader: Google

AI Model Offering Score: 4.82

Strategy Score: 4.66

Market Presence Score: 2

Google Gemini received the highest score of 4.82 on Forrester’s report thanks to its market differentiation around multimodality, context length and interconnectivity with Google Cloud services.

Gemini has the largest context window of all vendors evaluated: one million tokens today and two million tokens recently announced. It’s also one of the few commercially available multimodal LLMs, with top-notch multilingual capabilities across 37 languages which is higher than any other provider.

Regarding the ‘Strategy’ category, Google scored fives in innovation, roadmap, pricing flexibility and transparency, and partner ecosystem. Google’s lowest score came in the Market Presence category by scoring a one in revenue, which is on par with its competitors.

“Google has everything it takes to lead the AI market—enormous AI infrastructure capacity, a very deep bench of AI researchers, and a growing number of enterprise customers in Google Cloud,” Forrester said.

Leader: Databricks DBRX

AI Model Offering Score: 3.38

Strategy Score: 4.34

Market Presence: 3

Databricks’ DBRX product received a score of 3.38 thanks to offering both its own pretrained DBRX model as well as support for customers who pretrain or fund-tune their own models.

Databricks’ platform has solid capabilities in application development, governance, and security, and in managing models for training and deployment.

Regarding the ‘Strategy’ category, Databricks scored fives for vision, roadmap, partner ecosystem and supporting services. The company’s lowest scores of one was for interaction modalities and multilingual capabilities.

“Databricks’ offering is a good choice for enterprise customers who want a capable model that includes enterprise tooling to not only build solutions and fine-tune models, but also to pretrain their own model with their own data,” Forrester said.

Leader: Nvidia Nemotron

AI Model Offering Score: 3.38

Strategy Score: 3.68

Market Presence: 3

Nvidia’s recently released Nemotron product received a score of 3.38 that allows enterprise customers to use an Nvidia model out of the box and inspires existing and new technology partners to push the frontier.

Nvidia’s offering has very strong multilingual capabilities and offers multimodal interactivity through its Megatron model, while its NeMo Framework allows it to build AI-FM language models more quickly and efficiently on its platform.

Regarding the ‘Strategy’ category, Nvidia scored fives for innovation and its partner ecosystem. The company did not receive any low scores of either a one or two in the entire review.

“Nvidia is a good fit for enterprises that want to work with a partner who can deliver the optimal bridge between hardware and software needs for training and inferencing models,” Forrester said.

Inserting image...

Strong Performer: IBM Granite

AI Model Offering Score: 3.68

Strategy Score: 3.32

Market Presence: 1

IBM’s Granite product received a score of 3.68 thanks to providing customers with some of the most robust and transparent insights into the underlying training data and for protecting enterprises from risk from any unlicensed content in the training data.

IBM Granite has powerful capabilities to align its models to enterprise needs and the governance structures to enable the monitoring and management of models.

Regarding the ‘Strategy’ category, IBM scored a five for supporting services and offerings. The company received low scores of ones in both revenue and number of customers, as well as in context window and core capabilities.

“IBM is a good fit for customers that want 100 percent vendor indemnification from model training data and AI platform capabilities that empower AI teams to build AI solutions,” said Forrester.

Strong Performer: OpenAI GPT-4

AI Model Offering Score: 3.28

Strategy Score: 3.70

Market Presence: 5

OpenAI’s GPT-4 offering received a score of 3.28 thanks to its models being some of the most capable in the market as well as being one of the few providers that offering multimodal LLMs.

OpenAI’s GPT-4 strengths are in its core model capabilities such as code generation, multilingual capabilities, context window, and the scope of its training data.

Regarding the ‘Strategy’ category, OpenAI scored a five for vision, innovation and roadmap. OpenAI also received the highest score of any provide in the ‘Market Presence’ category with a top score of five. OpenAI scored low scores of ones were in model management deployment and supporting offerings.

“OpenAI is a good choice for developers who want to leverage the raw power of the models themselves into another platform to build more sophisticated app architectures and begin to build multimodal GenAI apps,” Forrester said.

Strong Performer: AWS’ Amazon Bedrock

AI Model Offering Score: 2.90

Strategy Score: 3.30

Market Presence: 1

AWS’ Amazon Bedrock received a score of 2.90 thanks to its Titan models and enabling any provider to offer their models within Bedrock.

AWS has strong capabilities in many of the surrounding support tools that its Bedrock service provides including model alignment, governance and security, and application development.

Regarding the ‘Strategy’ category, AWS scored a five for roadmap, pricing flexibility and transparency, as well as support services and offerings. AWS received low scores of ones around vision, innovation and number of Bedrock customers.

“Amazon’s AI-FM language offering will be appealing to AWS customers for its marketplace approach, rather than for the core Titan models themselves,” said Forrester.

Strong Performer: Microsoft Phi

AI Model Offering Score: 2.82

Strategy Score: 3.34

Market Presence: 1

Microsoft Phi received a score of 3.34 thanks to Phi models leveraging large amounts of synthetically generated content in addition to real content, which allows training with a smaller dataset that can be more tightly curated.

Microsoft Phi is less capable than many of the others in this market, but its small size and tightly curated training dataset is a core value proposition. Microsoft’s Azure AI services surrounding the Phi family provide strong capabilities for aligning model behavior to enterprise needs.

Regarding the ‘Strategy’ category, Microsoft scored a five for partner ecosystem and support services and offerings. Microsoft received low scores of ones around pricing flexibility and transparency, as well as in Phi revenue and number of customer numbers.

“Microsoft’s investment in and partnership with OpenAI stands out as unique, specifically because of its exclusivity,” said Forrester. “Microsoft is almost able to act as an AI-FM language provider of OpenAI’s core models as well as its own.”

Contender: Cohere Command

AI Model Offering Score: 2.72

Strategy Score: 2.34

Market Presence: 2

Cohere Command received a score of 2.72 thanks to making business-friendly models and support for the data pipelines needed for retrieval augmented generation (RAG)-based knowledge retrieval architectures.

Cohere’s Command models have strengths in core model capabilities for language and reasoning, as well as notable multilingual capabilities with pretraining data from a variety of languages such as specific optimizations in common business languages.

Regarding the ‘Strategy’ category, Cohere did not receive any high scores of fours or fives, while also scoring a one for partner ecosystem. However, Cohere Command scored a three for number of customers which was higher than many of the larger tech providers.

“Cohere is a good choice for customers who want an AI-FM language vendor that can give them strong support for RAG and other knowledge-retrieval use cases,” Forrester said.

Contender: Anthropic Claude

AI Model Offering Score: 2.46

Strategy Score: 2.68

Market Presence: 3

Anthropic Claude generated a score of 2.46 thanks to its ‘Constitutional AI’ principle for aligning models to enterprise needs, and importance of larger and more complex models.

Anthropic has very strong language capabilities in its core model, with one of the longest context windows currently on the market.

Regarding the ‘Strategy’ category, Anthropic scored a five for vision as well as a three for revenue. Anthropic received low scores of ones for partner ecosystem and supporting services and offerings.

“While Anthropic has done significant work to align its model family to its Constitutional AI approach during pretraining, the company needs to offer more significant capabilities for enterprises to build applications and manage models within them,” Forrester said.

Challenger: Mistral AI

AI Model Offering Score: 1.78

Strategy Score: 1.32

Market Presence: 1

Mistral AI received the lowest score on Forrester’s report of 1.78 for its FM language model, excelling around open-weight models.

Mistral models have strong core language capabilities that use a mixture of experts approach, which enables them to achieve higher accuracy while using fewer computing resources at inference time.

Regarding the ‘Strategy’ and ‘Market Presence’ categories, Mistral AI mostly received low scores of ones across the board.

“Mistral has made a name for itself over the past year by offering open-weight models that make strong appearances in model performance leaderboards, allowing it to punch above its weight in the market,” said Forrester. “However, the company must quickly add sales, marketing, platform tooling development, and partner operations to compete with the growing number of players in this market.”