Oracle recently announced the general availability of MySQL HeatWave Lakehouse, a fully managed database service.
The company previously debuted the service at its CloudWorld event last October. This lakehouse is the newest addition to MySQL HeatWave, a cloud service combining transaction processing, analytics, machine learning, and ML-based automation into a single MySQL database. The service is powered by the built-in HeatWave in-memory query accelerator.
MySQL HeatWave Lakehouse supports querying object store file formats including CSV, Parquet, and export files from other databases, and can combine object storage file data and MySQL database transactional data together in the same query. Object store files are queried directly by HeatWave without copying the data into the MySQL database, Oracle says. The company claims this results in higher scalability and performance for query processing, speed of loading data, cluster provisioning time, and automation to query data in object storage.
Oracle’s SVP of MySQL HeatWave, Nipun Agarwal, shared in a blog post the reasons for the new lakehouse feature. He says there has been unprecedented growth in data stored in object stores and data lakes in the past few years, and there is a need to analyze this data, but it can be challenging because of its size and lack of structure.
“Users often don’t want to load data in files in object store into databases to analyze it, due to the complexity, time, and cost of doing so. But they want to be able to combine data in a data lake with transactional data in databases to perform analytics,” he wrote.
Edward Screven, Oracle’s chief corporate architect, noted in a statement that more than 80% of data is stored in file systems, and customers looking to integrate and analyze varied external data with internal transactional data can find it to be a complex process.
“MySQL HeatWave Lakehouse makes it easy for customers to get valuable real-time insights by combining their data in object storage with database data while gaining significantly higher query performance and much faster data loading at a lower cost,” Screven said.
Oracle claims MySQL HeatWave Lakehouse is faster than many similar database services. The company ran an internal 500TB benchmark, based on the TPC-H benchmark, that found the lakehouse’s query performance was 9x faster than Amazon Redshift, 17x faster than Snowflake or Databricks, and 36x faster than Google BigQuery.
Oracle says this performance results from the scale-out architecture of MySQL HeatWave that enables massive parallelism to provision the cluster, load data, and process queries with up to 512 nodes. The company also says enhancements to MySQL Autopilot allow it to automate common data management tasks, including automatic schema inference for files, predicting the optimal cluster size and time to load data from object store.
“HeatWave Lakehouse scales out very well for loading data from object storage and for running queries on object store,” said Henry Tullis, leader, cloud infrastructure and engineering, Deloitte Consulting. “The load time and the query times are nearly constant as the size of the data grows and the HeatWave cluster size grows correspondingly. This scale out characteristic of HeatWave Lakehouse for data management is key to efficiently processing very large amounts of data.”