CelerData, vendor of the StarRocks OLAP database, announced it has contributed StarRocks to the Linux Foundation.
Originally created as a commercialized fork of the Apache Doris database in 2020, StarRocks is an open source database maintained by CelerData, formerly known as StarRocks, Inc. Moving away from Doris, 80% of the StarRocks codebase has been rewritten over the last two years. As a massive parallel processing OLAP database, StarRocks is now focused on real-time query support for analytics workloads.
One way StarRocks stands apart from competing OLAP databases is that it does not require transforming data from the star schema format into denormalized tables. Denormalizing tables can make them slower to update while adding complexity to data pipelines. StarRocks can query star schemas directly with all the joins and can maintain a high level of query performance even while data is being updated and changed. In a previous interview with Datanami, CelerData VP of Strategy Li Kang asserted that StarRocks’ denormalized competitors can support about 10 to 100 concurrent users, while StarRocks customers can run over 10,000 queries per second, representing tens of thousands of concurrent users.
“This is the first analytical database in the industry that addressed the critical technical challenges in both real-time and batch analytics, such as the need to denormalize data, the inability to process real-time updates, and the challenge of supporting large numbers of concurrent users,” said James Li, CEO of CelerData in a release. “StarRocks’ unique design can handle frequent updates to past transactions while still maintaining high query performance in real-time. This enables use cases previously considered not suitable for real-time analytics.”
CelerData also seems to have a particular focus on supporting data lakes with StarRocks. The company says the database enables developers to unite real-time analytics, an OLAP database, and data lake analytics onto one engine with a single data pipeline. CelerData announced StarRocks Version 2.5 earlier this month and touted upgraded features such as integration with more data lake ecosystems, the ability to query Delta Lake tables with zero migration, and integration with AWS Glue, a service that can be used as a lake analytics metastore for Apache Hive, Apache Hudi, Apache Iceberg, and Delta Lake.
CelerData says the StarRocks project has been an independent, source code available project since its inception in 2020, while helping over 500 companies launch digital transformation initiatives, including Airbnb and Lenovo. The move to the Linux Foundation will give StarRocks access to the foundation’s large ecosystem of open source contributors for further development and innovation.
“We look forward to collaborating with the Linux Foundation given its significant experience in operating an open source project. Being part of this Foundation will help us collaborate with other open source projects to improve user experience while building the world’s best query engine,” said Li.
“The Linux Foundation is delighted to welcome the StarRocks Project into its family of open source projects,” said Mike Woster, Linux Foundation chief revenue officer. “By providing a neutral home for collaboration, the Linux Foundation is able to bring together talented individuals and organizations from around the world to collaborate on building innovative solutions and technologies for shared benefit.”