Databases have evolved considerably over the past decade, but there’s still quite a bit more that databases can do, according to Cockroach Labs Co-Founder and CTO Peter Mattis, who sees serverless and multi-cloud capabilities near the top of the list, along with closer integration with object storage.
As the creator of CockroachDB, a geographically distributed relational database, Cockroach Labs is on the leading edge of scale-out database design. There just a handful of databases that can handle globally distributed ACID transactions. Google Cloud Spanner was the first, and now CockroachDB is one of several databases with customers in production.
Accurately accounting for writes in a globally distributed database setting is a really hard computer science problem, and one that Cockroach has investigated significantly in solving. The company is attracting large companies, including global banks and Netflix, that need this solution.
But that doesn’t mean the Cockroach developers aren’t resting on their laurels in their New York City headquarters. R&D “will never be done,” Mattis declared emphatically in an interview with Datanami on the future of databases, which is our editorial focus for the month of January.
The big new feature Cockroach Labs delivered in the past six months was the roll-out of a serverless version of CockroachDB running in the cloud. The development of CockroachDB Serverless took quite a bit of engineering work for Mattis and his team, as the database was initially architected as a distributed database that scaled out incrementally by adding nodes. With Kubernetes handling orchestration under the covers, customers no longer have to worry about adding more nodes to a CockroachDB cluster.
“One of the major, major challenges that we still experience in the database world is capacity planning, trying to provision the right amount of resources for your workload, so you can handle the burst. But you don’t want to overprovision because that’s expensive,” Mattis says. “Everybody is very cost-conscious right now. They don’t want to overspend.”
Instead of trying to correctly forecast the transaction workload in advance, the serverless approach allows CockroachDB customers to add nodes to their database cluster in response to demand, in an almost instant fashion. It takes seconds to add more capacity to a CockroachDB Serverless instance, versus tens of minutes to add a new virtual machine to a CockroachDB Dedicated cluster, Mattis says.
With the advent of serverless, the industry is beginning to change how they think about multitenant databases, the CTO says.
“This idea that, rather than having your database sized perfectly to the underlying hardware and then only being able to scale it incrementally based on adding cold units and additional machines, it’s better actually to have a much large physical database cluster underneath, and then slice up little virtual databases from that,” Mattis explains. “The advantage of doing this is you’re kind of packing a bunch of workloads into the same cluster, and presuming you have sufficient isolation controls–which we’ve bult into…the database layer–they’re effective isolated. They’re not physically isolated, which is good because then you can share the physical resources, and often times you see workloads have spikey behavior. If you average a bunch of workloads together, it evens out, so you actually get better overall resource utilization by doing this, and it gives a better experience.”
Integrating Kubernetes into the CockroachDB deployments is an important part of this overall offering, and it’s not a trivial exercise to devise a Kubernetes operator that works with a stateful system, such as CockroachDB (as opposed to a stateless system, which was the original K8S design point). But the Kubernetes integration was just a small part of the overall work in developing a serverless, multi-tenant database, Mattis says.
“It’s not ‘Oh we just sprinkle Kubernetes on top of this.’ There’s quite a bit more work than that,” he says. “Kubernetes is a component there, it‘s a core component, but it’s like one-tenth of the effort there. The other 90% was all the hard work inside the core CockroachDB itself.”
Mattis had some comments about the recent Datanami story about whether database are just becoming query engines for object stores. There’s some truth to the trend, he says, but it’s also an oversimplification of what’s happening, particularly for the OLTP systems that Cockroach Labs focuses on.
“There’s something there that’s truth and there’s something there that is kind of misportrayed,” he says. “S3 BLOB storage–I don’t want to say it’s eating the database world. That’s too strong. But there’s significant advantages to actually separating out the compute for database and storage for database.”
The part that the story missed, Mattis says, is that S3 is not becoming the primary storage layer for all the data. There’s a lot more going on than just putting it in S3. “It’s the foundational layer of the storage, but above that, you still have to organize the data in S3,” he says.
Much of that organizing (for OLAP systems anyway) is taking place in emerging storage formats like Databricks Delta Lake, Apache Iceberg, and Apache Hudi, he says. “And that’s definitely a core component of the storage layer,” he says. “I want to emphasize that the part on top of S3 is significant.”
Cockroach Labs actually has a project to utilize S3 storage as a backend. The company is doing this for the same reason that the OLAP players are utilizing S3: efficiency.
“If you can actually get to the point where you can scale the storage independently through the CPU, this leads to greater efficiencies,” Mattis says. “We’re not necessarily doing it because S3 solved all these problems. We’re doing it just from that efficiency angle and being able to scale it to the resource utilization based on the workload. ‘Oh, this is a very storage-heavy workload. OK more storage, less CPU,’ in a form factor you can’t get in a single VM.”
S3 storage shouldn’t be thought of as separate from the database, but as part of the database, he says. That’s not to somehow make things easier for database makers, Mattis says. In fact, there are hard computer science problems to solve by integrating S3 into the database. But since there are efficiencies to be gained, it’s something that Cockroach Labs is working on.
“Snowflake is like that, right?” he says. “S3 is the backend part, but they’re doing significant data storage code on top of that S3 backend. And the same will be true of Cockroach Labs if and when this comes to fruition. It’s more of a research project right now, but one that we’re investigating significantly.”
Another area of active research for the intrepid Cockroachers is support for multi-cloud environments. This is a request that CockroachDB users are making more and more often, Mattis says.
“Cockroach Cloud works on GCP and AWS right now. We’re going to add support for Azure,” he says. “And then after that, we’re going to add support for multi-cloud databases, a single logical database that will span three different cloud providers.”
The big banks are being pushed by regulators toward the multi-cloud realm, Mattis says. If one cloud provider goes down, and it takes the banking services for one of the biggest banks in the world down with it, that can have a potentially devastating short-term impact to the economy, so European regulators, in particular, are keen to force banks to do something about it.
“They’re actually getting mandated to get rid of that systemic risk,” Mattis says. “They want to have clusters and to have the whole financial services platform be able to run and spread across multiple clouds.”
At a conceptual level, supporting a single database image across three different cloud providers is relatively straightforward, Mattis says. Kubernetes will be involved, he says. But the biggest challenge will be integration at the networking level. Punitive data egress charges, he says, will also pose a challenge to reading and writing data to a single database spanning multiple clouds.
In a related development, the company is also working to devise a hot standby cluster for customers.
“Even though CockroachDB is a very highly reliable, resilient system that self heals with node or region failures, we have customer saying, even with that, we have workloads that are so mission critical, we want to have a hot standby cluster,” Mattis says. “So actually replicating to this hot standby cluster is functionality we’ve been working towards for a little while that we’re going to into preview this year.”
Mattis is quite bullish on Cockroach Labs’ prospects. The company is competing and winning deals against bigger competitors, he says, and it enjoys a two-year over smaller startups in terms of supporting geographically distributed ACID transactions.
“We’re being used in mission-critical workloads that, if they go down, it’s major–millions of dollars per hour of downtime, and significant impacts on these customers,” he says. “So it’s real-world, battle tested where I think we have a significant lead right now.”