DataStax, which arguably is best known as the commercial entity behind the scalable NoSQL database Apache Cassandra, turned some heads in 2021 with the addition of Astra, a real-time streaming data offering based on Apache Pulsar, to its roster of offerings. Now the Silicon Valley firm is turning heads once again with the acquisition of Kaskada, a provider of tools for simplifying and automating real-time machine learning workflows for data scientists and machine learning engineers.
Kaskada, which we profiled three years ago just before the world went into the first COVID-19 lockdown, was founded to help automate tedious feature engineering tasks. The Seattle, Washington, company’s flagship product is a feature store where data scientists and ML engineers can define the features they want to track in their machine learning experiments (through integration with data science notebook environments). Secondly, the software serves features to machine learning models as usable vectors, without writing any time-consuming re-writing of features in new languages or requiring the creation of data pipelines.
Davor Bonaci, Kaskada’s CEO and co-founder, described the offering as a “compiler between the studio and the feature store.” “We are compiling code from whatever the data scientist defines [and] automatically generating a real-time distributed system,” Bonaci told Datanami back in February 2020. “That’s where the rewriting goes away. We generate automatically a distributed system from what you define in our software.”
Kaskada, which uses Cassandra and Akka under the covers (or at least did back in 2020), is primarily used for machine learning projects involving real-time, event-based data, such as recommendation engines and real-time predictions for websites and mobile apps. By automatically keeping the feature vectors up to date based on data coming in from pub-sub systems like Apache Kafka, AWS Kinesis, or Pulsar, Kaskada helps eliminate a lot of “glue” coding that would normally occupy the life of the machine learning engineer.
DataStax clearly sees Kaskada filling a need among its customer base for simplifying the feature engineering work going on as real-time data flows between the data source (open source Pulsar or its commercial Astra Streaming product) and the data sink (open source Cassandra or its commercial Astra DB offering).
“Businesses must operate in real time, using data to power operations and fuel instant, informed decisions and actions,” DataStax chairman and CEO Chet Kapoor says today in a press release. “DataStax has many customers already using real-time data, and with Kaskada as part of our services portfolio, we can give them the opportunity to use that data to create powerful experiences for their customers with real-time AI. It’s an exciting time for DataStax, and we have a clear new mandate: real-time AI for everyone.”
Bonaci said he’s looking forward to working with DataStax to create a new generation of AI-powerd applications.
“AI is at its best when it has access to data at scale. And real-time data, in particular, will shape a generation of new applications and real-time decisions for every industry,” Bonaci wrote in a blog post. “Cassandra is uniquely suited for vast amounts of real-time data, making our decision to join DataStax strategically relevant for us, our mission, and the market.”
A decade ago, the initial wave of big data tools focused on collecting and analyzing huge amounts of data in batch workloads, in the hopes of finding useful patterns or other information that can be put to use later.
Today, the timeframe has been collapsed, and companies need to generate and consume those insights as fast as possible. This is particularly evident when it comes to personalizing Web and mobile content, according to Matt Aslett, vice president and research director at Ventana Research.
“The need for real-time interactivity means that these applications cannot be served by traditional processes that rely on the batch extraction, transformation and loading of data from operational data platforms into analytic data platforms for analysis,” Aslett says in the DataStax press release. “Instead, they rely on analysis of data in the operational data platform to accelerate decision-making or improve customer experience. High costs, complexity, and scaling issues have been roadblocks to many organizations in achieving dynamic, real-time intelligence in their operational platforms.”
One of the DataStax customers that might benefit from the integration with Kaskada is Priceline. The online travel firm, which is a Astra DB customer, leans on ML technology to help it serve relevant and personalized search results to customers.
Terms of the acquisition were not disclosed.