Technical complexity is inherent when building distributed stream processing systems, particularly when integrating new real-time applications with traditional IT systems. But Confluent, the company behind Apache Kafka, is working to drive that complexity out of view of the developers and engineers building the next-generation of real-time data systems.
One of the biggest technical challenges that architects have faced is getting streaming data systems like Kafka to play nicely with transactional databases, a necessity in many real-world deployments. It’s challenging because the two systems were designed to do different things. For instance, Kafka’s priority as a pub-sub system is to keep a stateless record of events, whereas relational databases are designed to maintain data in a stateful manner.
Getting these two approaches to mesh isn’t easy. The Lambda architecture was conceived to help alleviate the impedance mismatch between streaming and stateful data systems. That was followed by Kafka co-creator’s Jay Kreps’ Kappa architecture, in which a single technology stack is built around streaming.
Kafka’s embrace of the SQL language and launch of kSQLdb helped to further ameliorate the different approaches under the Kappa architecture banner. Being able to use a SQL to query not only the real-time streaming data flowing through Kafka but also query the historical record eliminates a lot of kludgey glue-code that was previously necessary, thereby simplifying the whole setup.
There are other examples where things that used to be hard in Kafka are getting easier. Michael Drogalis, a principal technologist in office of the CTO at Confluent, points out that a limitation in how much data a Kafka cluster could store was putting a damper on new use cases.
“One of the things that used to be hard is, if you wanted to build one of these types of systems [that combines streaming and historical data], Kafa had very small capacity for your data because it was bounded by the smallest disk on the broker,” Drogalis says. “And so that drove people to use Kafka for these lower-retention use cases. It becomes kind hard to build these event source systems when you maybe only have seven days of retention.”
Confluent engineers have addressed this problem over the last few years by introducing tiered storage. By offloading long-term data into an object store, such as Amazon S3, Confluent Cloud users can now store an infinite amount of data in their Kafka cluster, thereby eliminating any concerns about data retention, Drogalis says.
“So I think, bit by bit, you’ll see these problems resolved,” he says. “And between a combination of infrastructure improvement like that and interface improvement with SQL, these things will be more natural to build over time, and my hope is become the default way that people architect their software.”
Drogalis is one of the distributed systems engineers at Confluent who gets his hands dirty with low-level internals of the system, which is not something for the feint of heart. Drogalis arrived at Confluent in 2018 when his startup, called Distributed Masonry, was acquired by Confluent. Distributed Masonry developed the first version of the object-store backing for Kafka, and Drogalis and his new peers at Confluent completely rebuilt the integration after his company was acquired. In an interview with Datanami, he failed to hide the satisfaction of seeing his vision finally come to market.
“I definitely see a lot more people have high-capacity Kafka topics, using it for all sorts of use cases,” he says. “That makes me really happy because it sort of does away with this perception that Kafka is just this pipe that’s for low-retention data. It’s something you can use regardless of scale if streaming is the right access pattern for you. As our cloud becomes less expensive, more available in different cloud provider regions, people should just reach for the right tool for the right job and the more you can reach for streaming when streaming is the right approach, the better.”
Last week, Confluent rolled out several new features in its hosted Confluent Cloud platform designed to alleviate complexity involved in operating a streaming data system at scale.
Support for OAuth, which has emerged as an industry standard authentication mechanism, makes it easier for Confluent Cloud customers to manage authentication and access control. Admins get more fine-grained control over what resources a user can access with role-based access control (RBAC) support in the schema registry, Connect, and ksqlDB components of Confluent Cloud. Finally, the company rolled out Client Quotas, a new feature designed to ensure the performance of individual applications in multi-tenant cloud environments.
Taken together, the new features allow customers to run more applications on Confluent Cloud clusters without increasing the administrative overhead. These are the sort of enterprise features that are often lacking in open source products and which drive up the overall cost of using the technology in high volume production environments.
The approach seems to be working for Confluent. Last week the company, which trades on the NASDAQ under the symbol CFLT, announced fourth quarter and fiscal year 2022 financial results. The company reported $168.7 million in revenue for the quarter ended December 31, a 41% increase over the same quarter a year ago. It reported a GAA net loss of $0.37 per share, down from a GAAP net loss of $.043 per share last year. For the full year, Confluent reported $585.9 million in revenue, a 51% increase over fiscal year 2021. The GAA net loss per share came to $1.62, down from $1.82 last year.
“Confluent helps power rich customer experiences, more intelligent and efficient backend operations, and unlock new data-driven business opportunities,” Kreps says in a press release. “Our position as the category leader is illustrated by the 124% year-over-year growth in FY’22 Confluent Cloud revenue, 35% year-over-year increase in customers with $100k+ ARR, and a healthy dollar-based net retention rate of just under 130%.”
Wall Street apparently liked what it saw, as the company’s stock jumped more than 20% to $27.50 per share in the days following the news last week. CFLT gave up some gains but is still about 9% higher than where it started last week. It’s market capitalization sits just shy of $7 billion.
Confluent to Develop Apache Flink Offering with Acquisition of Immerok
Intimidated by Kafka? Check Out Confluent’s New Developer Site
Confluent Gives Streaming Data More Enterprise Chops