As expected, InfluxData today released InfluxDB Clustered, a major update of its enterprise time-series database for customers who prefer to run on-prem. Based on the InfluxDB 3.0 rewrite unleashed earlier this year, InfluxDB Clustered replaces InfluxDB Enterprise in the company’s lineup. And according to performance figures released by InfluxData, the new database is a heavy hitter.
InfluxData develops a time-series database designed to enable organizations to analyze very large amounts of metric, event, and trace data. The San Francisco company, a Y Combinator graduate, overhauled its database again in April with the release of InfluxDB 3.0, which introduced a number of changes designed to speed up processing of time-series data.
Enhancements in InfluxDB 3.0 include support for the DataFusion distributed SQL query engine, which is based on Apache Arrow, as well as support for Parquet, a compressed columnar data storage format. The database was previously written in Go, and the rewrite used Rust, the language used in the Arrow ecosystem. The overhaul coincides with a 100x speedup on queries of high-cardinality data, 45x faster data ingest, and a 90% reduction in storage cost when used on object stores, according to the company.
Cloud customers have been able to tap into these InfluxDB 3.0 enhancements for the past five months via InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated. With today’s launch of InfluxDB Clustered, the company is bringing these InfluxDB 3.0 benefits to InfluxDB Enterprise customers, which run the database on their own infrastructure or in private cloud environments (or in a managed public cloud setting in some circumstances).
In addition to fully supporting SQL, the lingua franca of data analysis, this release has been designed to run atop Kubernetes, the industry standard for container management. Customers can lean on K8S to handle the nitty gritty details of deploying InfluxDB Clustered on their own clusters server and storage clusters, or servers and storage residing in private cloud environments.
InfluxDB Clustered departs from the InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated products released earlier this year in that it is a self-managed product, says Rick Spencer, InfluxData’s vice president of products.
“This gives you ultimate control over your time-series database, making it well-suited to meet enterprise and compliance requirements,” Spencer writes in a blog post. “InfluxDB Clustered runs where you need it–on-premises, in your private cloud, or self-managed public cloud environments. This flexibility comes from the fact that we deliver InfluxDB Clustered as a collection of Kubernetes-based containers with decoupled, independently scalable ingest and query tiers.”
The separation of compute and storage in the InfluxDB Clustered gets customers partly where they need to go in terms of being able to scale the database to meet changing analytic needs. But the database also provides multiple storage tiers, including a hot tier and a cold tier residing in object storage, which are also independently scalable. That gets them the rest of the way, Spencer writes.
“Ingested data hits the hot storage tier first and it’s immediately available for querying,” he writes in the blog. “There’s no need to wait for batching or other processing on leading-edge data. This enables queries to be 45x faster than previous versions of InfluxDB. The hot storage tier consists of the data that you’re actually using. This can include data retrieved from cold storage as well.”
The combination of the multiple storage approach, along with the with big improvement in data ingest, gives InfluxDB Clustered the capability to query “unlimited cardinality data,” Spencer writes. In other words, customers can crunch much larger and faster moving data sets without worrying about bogging down the database.
The move from InfluxDB Enterprise to InfluxDB Clustered is “a gigantic leap forward,” Spencer writes.
“For a long time, users had to make difficult choices about their databases between performance, data retention, and costs. InfluxDB Clustered (and the rest of the InfluxDB 3.0 products) virtually eliminates those challenges. It delivers real-time performance, on leading-edge (and historical) data, while lowering TCO. Not only does this mean that you can do more with your data, but, because you manage your own infrastructure with InfluxDB Clustered, you can make more cost-effective decisions that reduce initial startup costs and long-term maintenance and overhead needs.”
You can find more information at www.influxdata.com.