The 10 Hottest Data Science And Machine Learning Tools Of 2023 (So Far)

Data science and machine learning technologies are in big demand as businesses look for ways to analyze big data and automate data-focused processes. Here are 10 startups with leading-edge data science and machine learning technology that have caught our attention (so far) this year.

Tool Time

Data volumes continue to explode with the global “datasphere” – the total amount of data created, captured, replicated and consumed – growing at more than 20 percent a year to reach approximately 291 zettabytes in 2027, according to market researcher IDC.

Efforts by businesses and organizations to derive value from all that data is fueling demand for data science tools and technologies for developing data analysis strategies, preparing data for analysis, developing data visualizations and building data models. (Data science is a field of study that uses a scientific approach to extract meaning and insights from data.)

And more of that data is being used to power machine learning projects, which are becoming ubiquitous within enterprise businesses as they build machine learning models and connect them to operational applications and software features such as personalization and natural language interfaces, notes Daniel Treiman, ML engineering lead at ML platform developer Predibase, in a list of ML predictions for 2023.

All this is spurring demand for increasingly sophisticated data science and machine learning tools and platforms. What follows is a look at 10 hot data science and machine learning tools designed to meet those demands.

Some are from industry giants and more established IT vendors while many are from startups focused exclusively on the data science and machine learning sectors. Some of these are new products introduced over the last year while others are new releases of tools and platforms that offer expanded capabilities to meet the latest demands of this rapidly changing space.

Amazon SageMaker

Amazon Web Services offers a long list of machine learning services including its flagship SageMaker, a fully managed platform that provides developers and data scientists with the ability to quickly build, train, test and deploy machine learning models in the cloud as well as on embedded systems and edge devices.

SageMaker supports a number of Amazon cloud services including Redshift, Kinesis Analytics, Elastic Compute Cloud and Elastic MapReduce.

In May AWS announced the general availability of geospatial capabilities in SageMaker, making it possible to build and deploy machine learning models using geospatial data. The geospatial functionality was previewed at re:Invent 2022.

Aporia

Aporia develops its namesake observability platform that data scientists and machine learning engineers use to monitor and improve the performance of machine learning models in production.

The platform provides visibility into model behavior and performance, making it easier to identify and diagnose issues that may arise in production environments and gain insights to improve models, according to the company.,

In April Aporia achieved compliance with the Health Insurance Portability and Accountability Act (HIPAA), allowing healthcare organizations and their customers to use the Aporia machine learning services.

Azure Machine Learning

Microsoft’s Azure Machine Learning cloud service empowers data scientists and developers to build, train and deploy predictive machine learning models, manage machine learning project lifecycles, and launch MLOps initiatives.

The platform includes Azure Machine Learning Studio, a GUI-based integrated development environment for constructing and operationalizing machine learning workflows on the Azure cloud.

In May GPU designer Nvidia said it is integrating its Nvidia AI Enterprise, the software layer of its Nvidia AI platform, into Azure Machine Learning. That move, the company said, will “create a secure, enterprise-ready platform” that enables Azure customers to build, deploy and manage customized applications using the more than 100 Nvidia AI frameworks and tools in Nvidia AI Enterprise.

Baseten

The critical step of integrating machine learning models with real-world business processes is generally a lengthy, expensive process. Baseten’s cloud-based machine learning infrastructure makes going from ML model to production-grade applications fast and easy, according to the company.

The Baseten serverless technology works by giving data science and machine learning teams the ability to incorporate machine learning into business processes without back-end, front-end or MLOps knowledge.

Comet Kangas

Machine learning models require huge volumes of high-quality data and that means data scientists often need to analyze large-scale datasets both during the data preparation and model training stages.

In November Comet debuted Kangas, a data exploration, analysis and model debugging tool that the company says helps users understand and work with their data in a highly intuitive way.

Kangas provides large-scale visual dataset exploration and analysis capabilities for the machine learning and computer vision community, according to Comet. The new tool makes it possible to intuitively explore, debug and analyze data in real time to quickly gain insights, leading to better and faster decisions.

Using Kangas, data visualizations are generated in real time, enabling ML practitioners to group, sort, filter, query and interpret structured and unstructured data to derive meaningful information and accelerate model development.

Comet, best known for its MLOps platform for machine learning teams, is offering Kangas on an open-source basis.

Databricks Model Serving

Databricks Model Serving, launched in March, provides simplified production machine learning natively within the Databricks Lakehouse Platform, according to the company, removing the complexity of building and maintaining complicated infrastructure for intelligent applications.

Integrated with Databricks Lakehouse Platform services, including the Unity Catalog, Feature Store and MLflow, the fully managed Databricks Model Serving service provides data and model lineage, governance and monitoring throughout the ML lifecycle, from experimentation to training to production, according to the company.

Businesses and organizations can use Databricks Model Serving to integrate real-time machine learning systems across an enterprise, from personalized recommendation applications to customer service chatbots, making it easier to integrate ML predictions into production workloads without the need to configure and manage the underlying infrastructure.

Domino Code Assist

Domino Code Assist (DCA) is Domino Data Lab’s “code-first approach” to low-code data science, allowing business analysts to automatically generate Python and R code for everyday data science and data analysis tasks and help enterprises “close the data science talent gap and democratize data science,” according to the company.

DCA, unveiled in February, helps chief data officers and chief data analytics officers involve more analytics professionals in data science and machine learning projects. Data preparation and data models are more transparent and portable, according to Domino Data Lab, and business analysts can collaborate more closely with advanced data scientists – all working on the same platform.

With DCA business analysts can automatically generate standard Python and R code for data ingestion, preparation, visualization and application creation tasks. DCA, meanwhile, boosts data scientists’ productivity by eliminating the need to write code and remember the precise syntax for repetitive tasks, according to the company.

dotData Feature Factory

In May dotData announced the public availability of Feature Factory, a platform the company says provides a new, data-centric approach to discovering, evaluating and engineering machine learning features.

Feature discovery is often a slow, laborious, iterative process that requires deep domain knowledge and can involve databases of hundreds of tables, thousands of columns and billions of rows, according to dotData.

Feature Factory automatically identifies and suggests feature spaces from enterprise data – including relational, transactional and temporal data. The technology helps users programmatically define feature spaces and auto-generate 100X broader feature hypotheses, according to dotData. By recording every data and feature transformation step, Feature Factory provides data scientists with a way to build reusable feature discovery assets.

Predibase

Predibase announced the general availability of its low-code, declarative machine learning platform for developers on May 31 after undergoing nearly a year of beta testing at a number of Fortune 500 companies.

The Predibase software lets both data scientists and non-experts quickly develop sophisticated machine learning-powered AI applications with “best-of-breed” ML infrastructure, according to the company.

The GA release includes privately hosted, customized large language models that allow developers to build their own GPT. It also includes Data Science Copilot, which provides developers with expert recommendations on how to improve the performance of their models as they iterate.

Predibase offers its technology as an alternative to traditional AutoML approaches to developing machine learning models for real-world problems. The platform uses declarative machine learning, which the company describes as allowing users to specify ML models as “configurations” or simple files that tell the system what a user wants and lets the system figure out the best way to fill that need.

Tecton

Startup Tecton develops a machine learning feature platform used to manage the data used to power predictive machine learning models. In March Tecton unveiled the latest version of its flagship feature platform that introduced new capabilities the company said accelerate the process of building production-ready features and expands support for streaming data. (A feature is data loaded into machine learning models to make predictions.)

The new Tecton release includes notebook-driven development to build ML features and generate training datasets. That makes it possible for data scientists and machine learning engineers to leverage the Tecton feature engineering framework within their core modeling workflow, including developing and testing production-ready features, without having to leave their notebook, the company said.

The release also includes a stream ingest API, which Tecton says provides more flexibility in managing streaming features. Developer teams can either automate their streaming pipelines with Tecton or transform streaming data outside of Tecton using a stream processing engine of their choice. Streaming data processed outside of Tecton can be ingested directly into the feature platform.

The release also provides a new continuous mode for non-aggregate streaming features, allowing feature data to be processed and updated within seconds of arriving from a stream. That’s critical for prediction cases such as fraud detection and real-time pricing that rely on low-latency features, Tecton says.