Comet partners with Snowflake to enhance the reproducibility of machine learning datasets 

1 min read


Join top executives in San Francisco on July 11-12 and learn how business leaders are getting ahead of the generative AI revolution. Learn More

MLOps platform Comet today announced a strategic partnership with Snowflake aimed at empowering data scientists to build superior machine learning (ML) models at an accelerated pace.

Comet said that the collaboration will enable integration of Comet’s solutions into Snowflake’s unified platform, enabling developers to track and version their Snowflake queries and datasets within their Snowflake environment. 

Comet says that this integration will enable the tracing of a model’s lineage and performance, offering more visibility and comprehension than with traditional development processes. It will also have an impact on model performance in response to changes in data.

Overall, the company believes, using Snowflake data in the Comet platform will result in a streamlined and more transparent model development process.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

Faster model training, deployment and monitoring

Snowflake’s Data Cloud and Comet’s ML platform combined will allow customers to build, train, deploy and monitor models significantly faster, according to the companies.

“In addition, this partnership fosters a feedback loop between model development in Comet and data management in Snowflake,” Comet CEO Gideon Mendels told VentureBeat. 

>>Don’t miss our special issue: Building the foundation for customer data quality.<<

Mendels said that integrating such a loop can continuously improve models and bridge the gap between experimenting with models and deploying them, fulfilling the key promise of ML — the ability to learn and adapt over time. He said that the clear versioning between datasets and models will enable organizations to better address data changes and their impact on models in production.

Comet’s new offering follows its recent release of a suite of tools and integrations designed to accelerate workflows for data scientists working with large language models (LLMs).

Enhancing ML models through constant feedback 

When data scientists or developers execute queries to extract datasets from Snowflake for their ML models, Comet will be able to log, version and directly link these queries to the resulting models. 

Mendels said this approach offers several advantages, including increased reproducibility, collaboration, auditability and iterative improvement.

“The integration between Comet and Snowflake aims to provide a more robust, transparent and efficient framework for ML development by enabling the tracking and versioning of Snowflake queries and datasets within Snowflake itself,” he explained. “By versioning the SQL queries and datasets, data scientists can always trace back to the exact version of the data that was used to train a specific model version. This is crucial for model reproducibility.”

Tracing changes in model performance to data alterations

In ML, training data is just as important as the model itself. Alterations in the data, such as introducing new features, addressing missing values, or modifying data distributions, can profoundly affect a model’s performance.

Comet says that by tracing a model’s lineage, it becomes possible to establish a connection between changes in model performance and specific alterations in the data. This not only aids in debugging and comprehending performance, it guides data quality and feature engineering.

Mendels said that tracking queries and data over time can create a feedback loop that drives continuous improvements in both the data management and the model development stages.

“Model lineage can facilitate collaboration among a team of data scientists, as it allows anyone to understand a model’s history and how it was developed without the need for extensive documentation,” said Mendels. “This is particularly useful when team members leave or when new members join the team, allowing for seamless knowledge transfer.”

What’s next for Comet? 

The company claims that customers currently using Comet — such as Uber, Etsy and Shopify — typically report a 70% to 80% improvement in their ML velocity.

“This is due to faster research cycles, the ability to understand model performance and detect issues faster, better collaboration and more,” said Mendels. “With the joint solution, this should increase even more as today there are still challenges in bridging the two systems. Customers save on ingress and consumption costs by keeping the data within Snowflake instead of transferring it over the wire and saving it in other locations.”

Mendels said that Comet aims to establish itself as the de facto AI development platform. 

“Our view is that businesses will only see real value from AI after they deploy these models based on their own data,” he said. “Whether they are training from scratch, fine-tuning an OSS model or using context injection to ChatGPT, Comet’s mandate is to make this process seamless and bridge the gap between research and production.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Source link