To further strengthen our commitment to providing industry-leading data technology coverage, VentureBeat is pleased to welcome Andrew Brust as a regular contributor. Keep an eye out for his articles in the Data Pipeline.
Hewlett Packard Enterprise (HPE) today announced that it has acquired a private open source vendor thick skin to drive the development potential of artificial intelligence (AI) and enable reproducible AI at scale.
San Francisco-based Pachyderm was founded in 2014 and has raised $28 million in funding to date. The financial terms of the acquisition are not being disclosed.
Pachyderm develops an open source-based data pipeline technology used to enable machine learning (ML) operational workflows. Pachyderm also allows users to define data transformation for how source data should be manipulated and configured so that it is optimized for AI. The whole data pipeline approach is designed to be easily reproducible, making it easier for data scientists to understand how data flowing into a model is collected and used.
Pachyderm will integrate with HPE’s ML Development System
The Pachyderm technology will be integrated into the HPE Machine Learning Development System, an application suite that helps companies build AI applications. The technology behind the HPE Machine Learning Development system was acquired through the acquisition of Certain AI in 2021.
“Pachyderm has been a long-time partner of ours and we regularly saw them as a complementary technology in customer engagement,” Evan Sparks, chief product officer for AI at HPE (and former co-founder of Determined AI), told VentureBeat. “We’re focused on training AI models and Pachyderm is focused on the piece of data, the part that comes in before model training with getting data ready and doing it in a way that’s reproducible.”
The challenge of AI reproducibility
The issue of explainable AI has been a hot button topic in recent years.
The basic idea behind explainable AI is not to have a “black box” that just calculates results without anyone being able to understand or explain how the results were achieved. Ensuring there is no bias is an important goal of explainable AI, as is fairness.
An underlying part of enabling explainable AI is having reproducible AI. The concept of reproducible AI is about having a series of steps for collecting data, building models and making conclusions that can be repeated in a consistent way.
“Our customers are people trying to deploy AI at scale for real manufacturing applications, for everything from insurance to self-driving cars to discovering new drugs that will be used in it to save lives,” Sparks said. . “Those kinds of use cases either have very strong financial ramifications or, in some cases, are life and death.”
With those implications in mind, Sparks said enterprises really want a lot of confidence behind the models they deploy. A cornerstone of trust is knowing that if an organization uses the same data, with the same model, it can generate the same output.
With Pachyderm, Sparks said that goal is to make sure that the data pipeline of how data comes from a source and into a model is consistent and reproducible. He noted that Pachyderm’s technology alone is not enough for a fully explainable AI approach, which also requires model testing capabilities. Sparks said HPE is working with a number of different partner technologies to support explainable AI capabilities for the model itself.
How Pachyderm works to enable reproducible AI
The Pachyderm technology has a number of different capabilities that help support reproducible AI efforts.
Sparks said Pachyderm offers data lineage tracking, which is the ability to trace where data comes from. The technology also provides data versioning capabilities that help data scientists understand and manage different versions of data.
What particularly stood out for Sparks about the Pachyderm technology is its ability to transform data to make it useful for AI. He explained that for some use cases there may be a need for an AI model to combine data from multiple sources.
For example, an autonomous vehicle company will receive computer vision data from in-car cameras, as well as LIDAR (light detection and range) data. That data is probably in two different places and comes in different formats. For the machine learning models to do their job, that data must first be combined before the model is trained. That kind of complex transformation is one that Pachyderm could help make possible in a reproducible approach.
Looking ahead, Sparks said the overall goal for the HPE AI product portfolio is to enable an end-to-end platform for model development and deployment at scale.
“We’re looking at how to develop an end-to-end offering around AI at scale and what it should look like,” said Sparks. “Pachyderm is a very complementary part of this overall portfolio of the world.”
VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.