Hewlett Packard Enterprise announced it has acquired Pachyderm, a startup whose open source software automates reproducible machine learning pipelines for large-scale AI applications. The purchase price and terms of the deal were not disclosed.
When it comes to achieving transparent and accurate results for ML projects, the ability to reproduce training data pipelines is important for explainability and compliance. ML models can be difficult to recreate due to the intricacy of managing training datasets, often comprised of high volume and complex data. Once it is time to update models, retraining requires unaltered datasets to maintain accuracy and output, but many changes can occur in this data and its related code on its journey from training to deployment. Pachyderm automates the process of building reproducible ML pipelines through data lineage and versioning capabilities. Its solution is based on a distributed, immutable file system and an execution layer, which are pre-integrated and designed to work together.
“As AI projects become larger and increasingly involve complex data sets, data scientists will need reproducible AI solutions to efficiently maximize their machine learning initiatives, optimize their infrastructure cost, and ensure data is reliable and safe no matter where they are in their AI journey,” said Justin Hotard, executive vice president and general manager, HPC and AI, at HPE. “Pachyderm’s unique reproducible AI software augments HPE’s existing AI-at-scale offerings to automate and accelerate AI and unlock greater opportunities in image, video, and text analysis, generative AI, and other emerging large-language-model needs to realize transformative outcomes.”
HPE says it is expanding its scalable AI portfolio by bringing together its supercomputing technologies and its HPE Machine Learning Development Environment, which the company describes as ML software that enables users to develop, iterate, and scale models from proof-of-concept to production. The company plans to build on these solutions by integrating Pachyderm’s reproducible AI capabilities into a single platform that will automatically refine, prepare, track, and manage repeatable ML processes in the development and training environment.
HPE asserts this integration will enable faster and more accurate deployment of large-scale AI applications with these benefits:
- Data lineage: Visibility on the origin of the data and where it moves over time during the machine learning lifecycle and analytics process to easily trace errors back to the root cause.
- Data versioning: Ability to track different versions of data to understand when data was created or changed at any point in time, to increase efficiency in making any changes.
- Efficient incremental data processing: As data changes over time, only incremental data needs to be processed to update AI applications. Pachyderm makes incremental data processing automatic and efficient.
Use cases for large-scale AI projects include natural language processing, computer vision, and video and image processing across industries such as transportation, manufacturing, life sciences, and defense. Lockheed Martin has deployed Pachyderm’s software and HPE’s Machine Learning Development Environment as part of AI Factory, its foundational AI ecosystem, according to HPE’s release. HPE says leveraging these capabilities has allowed the company to standardize its AI technologies while increasing trust and performance in support of national security missions.
Pachyderm was founded in 2014 by Joe Doliner and Joey Zwicker as a containerized alternative to Hadoop. HPE previously invested in Pachyderm through its venture capital arm, Hewlett Packard Pathfinder, as part of a $28.1 million investment round in February 2022 that also included Microsoft’s M12 and Y Combinator.
HPE’s acquisition of Pachyderm is not subject to regulatory approval and is expected to close this month.