Things move quickly in the world of real-time streaming. In some cases, companies want to track millions of events per second. Staying on top of these data volumes can be extremely challenging for traditional stateless computing paradigms. But for the folks at Swim, who have devised a stateful approach with their vertically integrated real-time data platform, staying afloat in fast-moving data waters is proving quite doable.
The Internet itself emerged largely under a stateless computing paradigm, which improves resilience when things go awry. As the Net grew, statelessness helped companies scale applications to great heights. The trend continues with today’s popular frameworks, such as Apache Kafka, a stateless event broker that forms the backbone for many real-time data processing efforts today.
However, we’ve begun to reach the edge of what stateless programming can deliver for real-world big data applications on the edge. According to Swim.ai‘s CTO and co-founder Chris Sachs, when state inevitably is required in a real-time application, too much time is lost querying the database and waiting for the value to be returned.
“If you have one firehose that’s 5 million events per second and another firehose that’s 5 million events per second and you have to put them together, in a stateless service, every time you get an event from one, you need to query the states of the others,” Sachs says. “It just doesn’t work.”
Some simple calculations demonstrate the time crunch involved here. CPUs work on nanosecond time scales, while networks are millisecond at best, Sachs says. That translates to timeframes that are six orders of magnitude apart, or a gap of 1,000,000% between the CPU’s clock and the network’s clock.
Another problem with real-time stateless computing is the continuous polling that’s required to determine when a value has changed.
“You sit there in a loop: Has it changed? Has it changed? Has it changed?” Sachs tells Datanami. “That doesn’t scale. That’s why everything is near-real time, because you would have to poll infinitely to be actually real time.”
Instead of engineering a solution for the state problem within the same paradigm, as the folks at Confluent did in 2019 by melding the RocksDB database to Kafka, the folks at Swim decided it was time to rethink the whole stateless approach.
“We’ve been riding on this set of assumptions for 30 years,” Sachs says. “The Web was developed as a shared library to share research papers between Stanford and CERN in Switzerland. It’s amazing what we’ve been able to do with it. But at the end of the day, not everything is a document. And a lot of the complexity and pain is [due] to building everything in terms of these primitives that were built around documents.
“But we’re talking about increasingly things that live, evolve, and change, and it’s just the wrong primitive,” Sachs continues. “We have his hugely complex ecosystem that’s been set up to compensate for those weaknesses, whereas if you start with first principles, there’s no reason why it has to be so complicated. Just directly solve the problem.”
Swim has devised a very different approach that allows developers to build real-time applications utilizing stateful microservices. The company uses a actor-based approach, simliar to how Akka works. According to Sachs, Swim’s platform is a vertically integrated version of a distributed object model.
Instead of using a stack of common open source components like databases and messaging busses and job execution layers (which could have simplified development), Swim created its own vertically integrated stack, which minimizes latency.
“With Swim, we glue those layers together, then we slice it vertically into millions of little parts, where each part is this thing called a Web agent, which has its own persistence, its own own state, its own streaming APIs,” Sachs says. “So you maximize data locality so those things that used to take 100s of millseconds now take nanoseconds because you just do a call trace through the stack.”
Each Web agent has its own URI that can be called to execute a given process, and developers can string these agents together to accomplish tasks in a cascading graph. “It’s like a materialized object relational model, where the relationships are streams, and then you sort of wash data over it,” Sachs says.
The final piece of the puzzle is a cache coherence protocol, which helps ensure that a cohesive view of the data is maintained at all times. “We basically run a cache coherence protocol, which is sort of how we’re able to absorb dynamic loads,” Sachs explains. “That’s not the way Kafka and Pulsar [work]. They’re prone to buffer bloat. You can run into problems, where you’re publishing data too quickly and that leads to cascading latency problems. This is something that actor systems like Akka and Erlang run into as well.”
However, jumping out of the stateless stream and into the fire, as it were, introduces a whole new set of issues. After all, maintaining the state of a large number of entities is difficult. That, of course, is the whole reason why the most in the industry have taken a stateless approach (with some exceptions).
“In order to make something stateful that scales, you have to be really careful,” Sachs says. “You have to be bounded in your memory use and bounded in compute and network utilization. You have to be able to propagate back pressure throughout the stack.”
Going against the stateless grain hasn’t yet paid off in a huge way for Swim.ai, which has raised about $25 million in venture funding. It’s not the only company embracing stateful programming, but it’s approach is unique enough that it stands out from the crowd.
Sachs, who is largely self-taught, isn’t afraid to challenge the status quo around statelessness. “Statelessness is a problem,” he continues. “It makes a ton of sense for the early Web and for document stuff. It makes no sense for live automated operations.”
Swim has been working on its platform since 2015, and the product is ready for production use. Beyond the innards of how the Java-based product works, it also features a visualization tool that allows users to see the data that’s being tracked.
Swim is ideal for operational real time use cases, such as where users need to know what is happening right now. The Cambell, California company has just a handful of customers, but they tend to be very large organizations in logistics, telecommunications, and the federal government.
One of Swim’s top customers is Verizon, which uses Swim to monitor for problems on its cellular network. Verizon has developed Web agents for entities that it wants to track, and uses Swim and its streaming APIs to detect when problems occur.
“Right now, it is massively labor intensive to figure out where their problems are,” Sachs says of the cellular giant. “They’ll get alarm storms…but the fact that you have the data doesn’t mean you know what’s going on or that you understand it. So there’s a huge amount of manual effort that goes into it.”
The biggest hurdle to adopting Swim is the mindset required to program in a stateful manner rather than the stateless approach that has dominated the industry for so long.
“The biggest challenge for developers is, it’s a big cognitive shift,” Sachs says. “Instead of thinking ‘For all data or all time, how do I reduce it and run analytics functions?’ you’re thinking ‘OK, for one thing, what’s the behavior I want for a traffic intersection or a truck or a package?’ and then building it, bottom up. So it’s definitely a different way of thinking.”
As new applications in edge computing, IoT, and the metaverse come online, the rate of real-time data needed for analysis will explode. Will the stateless approach be able to keep pace with faster 5G networks, or will a more simplified and stateful programming paradigm begin to emerge. Only time will tell.