Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Closing the gap between expectations and reality for optimizing data movement. Commentary by Li Kang, VP Strategy, CelerData
Artificial intelligence and machine learning have made it possible to process an enormous amount of data in real-time–under ideal conditions. But for many users, when those capabilities are put to the test in commercial environments at scale, and supported by traditional data architectures, the reality doesn’t match expectations. That is because legacy systems weren’t designed to move data at the speeds necessary for real-time analytics. The good news is that real-time analytics is possible, provided investments include rearchitecting underlying systems in order to remove the friction associated with legacy processes. Closing the gap between expectations and reality means combining a massively parallel processing (MPP) query engine with a streamlined architecture designed to optimize data movement between the components through which data flows. It’s also important to ensure backend tasks like storage, formatting, and organization are handled separately from frontend tasks like metadata management, query planning, and scheduling to ensure friction is further reduced even as data volumes scale.
How AI segmentation can re-engage customers. Commentary by Brian Walker, chief strategy officer at Bloomreach
The increased focus on data privacy protection and regulation may ultimately help companies better leverage AI to re-engage existing customers by focusing them on the practices they probably should have been following from the beginning. No longer can companies rely on cookies and retargeting to track customers and collect data that the customer did not opt-in to be tracked. When personalizing and scaling marketing efforts, there are no shortcuts to collecting first-party or zero-party data that the customer volunteers as they engage. This is what it takes to truly understand who your customers are. AI-drive segmentation helps marketers better cater to individual customers at scale, allowing them to curate marketing messages and content for customers based on what customers actually want – based on preference, occasion, price, style, and so on. Rather than wasting marketing budgets targeting the wrong customers or delivering generic messages to all customers, AI helps businesses better define fine-grained customer segments to employ specialized, personalized marketing strategies. Segmentation strategies become even more impactful when updated based on real-time in-session behavior, allowing companies to communicate in a relevant way on products that actually matter to the customer. And this can deliver meaningful business results – saving wasted marketing and driving conversion and basket-size, something that has always mattered – but which matters today more than ever. When segmentation is done right, it can ensure an unbeatable edge for your organization.
Democratization in AI through search. Commentary by Amr Awadallah, co-founder and CEO of Vectara
Over the last 25 years, I’ve witnessed firsthand how companies have evolved in their approach to AI and ML. Some have succeeded spectacularly, but many more have failed. This is due to a shortage of highly-skilled talent and the exuberant initial investment required to leverage these revolutionary technologies successfully. The consequence is that only a handful of powerful companies are taking advantage of AI’s full potential while the rest of us are left watching. The more machine learning advances, the more these titans seem to cement their dominance. Search is a space that will see a significant impact from democratizing AI. Until recently, the most sophisticated search has been the exclusive province of major global tech companies. But as neural network technology becomes more accessible for smaller businesses, developers without AI experience will be able to build capabilities that rival and even exceed today’s top commercial search engines. In the coming years, I predict the internet will be swamped by content generated by large language models designed to game the top search algorithms. This will lead to lower relevance from those providers, in turn allowing content owners to provide a better search experience through their native applications. The net result could be a more level playing field that redistributes power from today’s commercial search leaders to the rest of the world.
From one-shot to few-shot learning. Commentary by Chang Liu, Director of Engineering at Moveworks
One-shot and few-shot learning problems have been rapidly gaining popularity in the machine learning world recently. Against the backdrop of exponential growth in GPU processing power, and the data appetite of large ML models, the industry is increasingly reshaping itself into data-centric AI. As such, learning as much as possible, as efficiently as possible is emerging as a key competitive advantage — which is one-shot and few-shot learning’s biggest allure. Let’s say you want to automate a new task using NLP. Before, teams would need to create, train, evaluate, deploy, and monitor an entirely new model for each new task. This becomes a tedious effort as the number of models increases. But, with the introduction of large language models, teams are leveraging these intelligent systems to solve problems at scale. Instead of building an entirely new model, the language model only needs to be trained on the description of the task and the appropriate answers. The problem here is that large language models are notorious for their unpredictable responses. In order to deliver accurate results that don’t introduce potential liabilities, teams need to have tight parameters and rules around what a model can and cannot produce.
5 Reasons Why The Chatbot Is Dead and What’s Next in Conversational AI. Commentary by Jim Kaskade, CEO of Conversica
Here are the five reasons why the chatbot is dead: (i) They are yesterday’s AI: Today’s AI platforms have dynamic message generation capabilities with solutions with a long history of real-world interactions that the AI can learn from and improve its accuracy. These intelligent digital assistants are able to engage in human-like dialogues able of determining the best next action, (ii) They are not scalable: Customers and prospects need to talk to someone and the standard chatbot doesn’t have the tech capabilities to understand human interactions. They basically route people to a human once it gets a vague idea of interest — and this is not scalable. Also, modern conversational AI platforms are deployed to act as integrated team members taking care of the many time-consuming but necessary tasks sales, marketing and customer success teams must perform, (iii) They are not flexible: Chatbots have many use cases and are generally specialized in certain points in the lifecycle, like web chat, that only engages leads when they’re on the website without any capabilities of going beyond, (iv) They are limited: Companies that rely on chatbots can only engage with customers while they are on the website. When they leave, the conversation is over. AI-powered smart digital assistants that can interact with prospects and customers at any moment and across channels (SMS, chat and email) without losing context, and (v) They don’t help companies make money: Standard chatbots have been deployed on every website to help capture new leads. But sophisticated Conversational AI solutions must replicate the experience of communicating with a human to deliver real revenue results. For example, Conversica customers experience 24x ROI on conversation automation on average.
Infrastructure spending needs geospatial data to pay off. Commentary by Dr. Mike Flaxman, Product Manager, HEAVY.AI
The recent infrastructure and inflation bills invest nearly $1 trillion in modernizing and expanding America’s power grids, water distribution systems, public transit and broadband networks. They also put tremendous pressure on the state/local governments and utilities that will receive this money: With so much at stake, how will they effectively plan and execute such radical upgrades? Geospatial data will be key to making decisions for each of these segments. For example, geospatial data will tell utilities where they should locate new grid/water infrastructure and which type of infrastructure (solar farms, wind farms, dams) will best serve each region. When it comes broadband expansion, geospatial data is needed to pinpoint where exactly to dig tunnels and lay cables. Telecoms need a precise understanding of local geography (trees, hills, river) to avoid obstructions and optimize broadband paths. Geospatial data will play a fundamental role in ensuring these infrastructure projects succeed and the public’s money is well spent.
Retailers stay stocked with AI. Commentary by Rajiv Nayan, General Manager, Digitate
With the holiday season here, retailers need to ensure their business operations and IT system which supports these business operations are running like a well-oiled machine. Retailers can lose significant revenue of dollars when products aren’t consistently stocked and shelves sit empty. They can also lose considerable money when inventory is overstocked and has to be discarded or sold at a discount. Predicting what to order, when to order it, and how much to order is a difficult exercise and constantly in flux. This problem is made even more difficult by inconsistent supply chains. AI gives retailers data-based insights and closes the loop with automated actions so they never have to guess when to restock goods and shelves never sit empty, driving significant improvement to their bottom line. Leveraging AI, retailers can detect precise buying patterns, telling them at exactly which point their inventory needs to be replenished while taking into account current wait times to receive any given item.
Putting Responsible AI into practice. Commentary by Liran Hason, CEO and Co-Founder, Aporia
While the discussion surrounding Responsible AI is important, it’s how to actually put it into practice that should be a top priority. First, it’s crucial to clear up the subject, as there seems to be a bit of confusion between Responsible AI (RAI) and ethical AI, which are two different beasts. Ethical AI is more of a moral code on how organizations can use AI for the benefit of society, climate, etc., and it falls under the RAI umbrella. Responsible AI is a framework of practices that ensures machine learning models are working as intended, and when they’re not – what is the most secure and efficient way to initiate remediation before ML faults impact the end user? Only with an RAI framework set in place can organizations aspire and act upon their ethical AI goals, because if a model is experiencing drift and subsequently underperforming, no amount of ethics or good intentions will help. For us, as an emerging AI-centric society to fully adopt AI and use it to advance humanity, the proper guardrails must be in place. Business leaders and ML practitioners need to be aligned, incident response needs to be prioritized, and visibility into your model performance and health must be a requirement.
Cutting cloud cost: be aware of data egress cost. Commentary by Bin Fan, VP Open Source and Founding Engineer at Alluxio
A recent survey by Wancloud shows that cloud costs are rising, and senior executives are reducing cloud spending. Among cloud costs, egress fees are often overlooked because they not only occur when moving data out of the cloud, but also between cloud zones and regions. It is important for organizations to architect for cloud cost management in order to avoid sticker shock from unanticipated egress costs. We have customers adopting data architectures that rely on cross-region data replication. They accumulate data silos across various regions/cloud providers through mergers and acquisitions. Accessing data across regions generates huge egress charges, which becomes a huge pain point. It is recommended to adopt more egress-friendly cloud architectures that use caching and storage abstraction to minimize the amount of data transferred on the network to reduce egress fees.
Cloud PCs: a game-changer for IT troubleshooting. Commentary by Amitabh Sinha, Co-Founder and CEO of Workspot
Gone are the days of all employees operating under one network within the perimeter of a central office space. Today the norm is to work-anywhere, from a variety of locations using multiple endpoints. Hybrid and remote work models have been shown to increase employee productivity, and they also help companies attract and recruit top talent, according to 76% of IT leaders. Yet as companies evolve to support more extensive remote work, troubleshooting end user computing issues becomes more complex. IT teams struggle to stay in front of a variety of challenges, including a vastly larger cyber attack surface, which can impact productivity and business continuity. Modern enterprises recognize that inadequate end user computing technology results in costly inefficiency, security risks, and downtime. They are quickly concluding that traditional virtual desktop infrastructure (VDI) and physical PCs cannot keep up with today’s requirements for security, scalability, performance, and reliability. These factors are prompting many to switch to Cloud PC solutions. When adopting a Cloud PC solution, one of the most important features to look for is 24/7, real-time monitoring of Cloud PC health with trend correlation and analysis. Continuous observability helps the IT operations team detect patterns and root causes across the IT environment to proactively mitigate computing issues. Not only does this data analysis foster both end user and IT team productivity by easing the IT troubleshooting burden, but it can also reveal behaviors that indicate a potential security breach. Equipping IT teams with a sophisticated, big data trending and correlation engine that makes global Cloud PC observability easy means less time troubleshooting and more time contributing to the success of the business.
Twitter Hack Affects Millions Due to New Attack Vector: APIs. Commentary by Richard Bird, Chief Security Officer from Traceable
The timeline and the confusion reflected in Twitter’s statements to the market about its latest breach echo the widespread lack of understanding about the risks associated with APIs, as well as the inability to secure those APIs in a timely manner. Twitter created a pathway to a broken object-level authorization exploit and then believed that no one capitalized on that error. Unfortunately, that has been proven wrong. This is the problem with APIs; when you have no security program around them, bad actions don’t look any different from normal users. Twitter simply didn’t understand the difference between a use case and an abuse case within their code, and this is something that happens regularly to companies of all sizes. This incident should serve as a reminder to the world of how weak API security is within almost every corporation and organization on the planet.
Meta Fined $277M by Ireland Data Protection Commissioner. Commentary by Chris McLellan, Director of Operations at non-profit, Data Collaboration Alliance
Virtually all the power over personal data collection, use, and access resides with application owners and digital service providers. This has been allowed to persist for decades due to a general lack of concern on the part of citizens and consumers. But things are starting to change as people wake up to the fact that the battle for control over their information is actually of central concern to the future of their lives, communities, and children. Regulators, too, have been making moves to establish a more fair balance with mandates for data protection rights for access, correction, and deletion. But regulations are just a starting point. Let’s face it – we’ve all become addicted to the conveniences offered by personal and business applications, and that’s unlikely to change any time soon. And the predicted transition to more virtual experiences rather than traditional apps doesn’t change this one bit. The way apps manage data is the real problem in establishing the level of control necessary for enforcing outcomes like those outlined in GDPR and California’s CCPA. Sensitive and other information is fragmented into databases, which then get copied at scale through a process known as data integration. This is at complete odds with the global movement towards increased data privacy and data protection. Bottom line: If we want to get serious about data protection and data privacy, we need to think seriously about changing the way that we build apps. We need to accelerate the use of new frameworks like Zero-Copy Integration and encourage developers to adopt new technologies like dataware and blockchain – all of which minimize data and reduce copies so that the data can be meaningfully controlled by its rightful owner. Until then, the endless parade of fines and regulatory show trials – or any attempt to mitigate the underlying chaos that defines the current state of personal information – are doomed to fail.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW