Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Taking Charge of Tables: Introducing OpenHouse for Big Data Management. Commentary by Sumedh Sakdeo, Senior Staff Software Engineer at LinkedIn
“Data infrastructure teams that are responsible for building and managing open-source lakehouse deployments often face the challenge of dealing with scattered data, multiple operational costs, and inconsistent governance practices.
To address these issues, we built OpenHouse, LinkedIn’s control plane for managing tables. OpenHouse exposes a RESTful catalog for users to provision tables, share them, enforce governance, and declaratively specify user-desired policies, all while seamlessly integrating with all components in the data plane. To alleviate the burden on end-users, we ensured that our solution removed the need for users to worry about development and operational costs associated with maintaining user configured and optimal state for their tables. It was also important to us to exercise complete control over the underlying distributed storage, enabling us to uphold consistency, security, governance, and cost. OpenHouse also automates a wide variety of use cases, such as honoring user-defined data retention, cross-cluster snapshot replication, maintenance activities for Iceberg table format, data restatement for optimizing layout, enforcing governance, and secure table sharing. With OpenHouse, developers only need to define their policy, and the system takes care of the rest, simplifying and streamlining the entire data management process.
Through these integrated efforts, we achieved an easy-to-use, efficient, and well-governed data lakehouse environment for our developers.”
What generative AI’s emergence tells us about the future of data in insurance. Commentary by Peter Levinson – VP of Product at Arity
“AI has been a key component of the insurance industry for decades. One of its most widely recognized use cases is usage-based automotive insurance programs that help carriers price customers more accurately and fairer based on how they drive and their behaviors driving, not by who they are or where they live. The renewed attention and focus on AI right now is unearthing an opportunity for the industry to refresh its historically complicated and often biased relationship to AI and take the opportunity to find a path forward to use data for good. How? By putting customers first – ensuring customers know exactly how and why their data is being used and how it benefits them. Then, by creating highly personalized, unbiased insurance programs that caters to individual needs.
As state legislation proceeds with banning the use of credit-based scoring algorithms to set auto insurance premiums, insurers will continue to see requirements for algorithm testing and scoring models to help uncover biases and rely on driving behavior data more frequently as a solid source of truth. There is a world where AI continues to strengthen usage-based automotive insurance (UBI) models and helps automotive insurance carriers understand risk before writing a quote – continuing down the path of helping insurers fairly and equitably gauge risk. A world where AI can aid in designing comprehensive coverage plans tailored to individual needs, striking a balance between affordability and protection.
AI has the potential to create meaningful impact when insurers, businesses, cities and consumers take the opportunity to use data responsibly and collectively. For example, AI could potentially help: (i) Identify patterns when a seemingly safe intersection becomes dangerous; (ii) Provide a safer or more efficient route for drivers based on bad weather; (iii) Deliver personalized ads to drivers who pass by a business in their community.”
Leveraging AI to build a resilient supply chain. Commentary by Vijay Raman, Head of Product & Technology, at ibi
“The devastating supply chain crisis of the previous three years has largely dissipated. To avoid similar disruption from occuring in the future, many businesses have begun integrating Artificial Intelligence (AI) into their supply chain operations. This is great news for industries across the globe. AI has the ability to help supply chains withstand adverse events and bolster their resiliency — even on an international scale. For example, the latest development in AI technology can prompt a more immediate response to supply chain issues, such as extreme weather and logistical instability. AI-powered supply chains can identify and signal a current problem or potential risk in real time, allowing necessary personnel to quickly intervene and stop potential disruption. AI is also important for eliminating data silos in a supply chain. A supply chain, by nature, depends on multiple departments, teams, or businesses to take a product from ideation to delivery. At times, the data garnered in each section of the supply chain can become isolated and create a silo. Through virtualization, AI can help connect these disparate data sources, allowing teams to securely access and remove bottlenecks in their processes.
If today’s enterprises want to reap these benefits in their supply chains, they need to create what I like to call a ‘Supply Chain Nervous System.’ With the right AI technology, business leaders can implement capabilities in their supply chain that will make them more connected, secure, agile and resilient. The first component AI can help with is data. The right AI solution can take data from various sections of the supply chain, or silos, and put it into a data warehouse. From this location, supply chain managers can manage the data to ensure quality and security. AI can also help stream relevant information and processes in a supply chain to gain real time benefits. Leveraging quality data, supply chain managers can allow stakeholders to stream relevant business intelligence and data science so that business leaders can make pertinent decisions with confidence. Supply chain managers can also use AI to stream automation. The right AI technology will allow stakeholders to test supply chain inputs — and their reactions — and fix potential issues or optimize processes. Each of these capabilities can help businesses gain a firm grasp on their operations and immediately react to any event that threatens to disrupt their supply chain.”
Avoiding the pitfalls of AI ‘hallucinations’ with quality data. Commentary by Jason Schern, Field CTO at Cognite
“Generative AI, such as ChatGPT/GPT-4, is propelling industry digital transformation into hyperdrive. For example, previously, a process engineer may spend several hours manually performing “human contextualization” at an hourly rate of $140 or more. Now, contextualized industrial knowledge graphs can provide access to trusted data relationships that enable Generative AI to accurately navigate and interpret data. This breakthrough frees operators from the requirement of specialized data engineering or coding skills, as the no-code platform empowers users of all abilities to harness the capabilities of Generative AI without the need for manual coding.
The true power of Generative AI technologies is the ability to codify human expertise by creating secure and accurate data. However, while Generative AI can help make your data “speak human,” it won’t necessarily speak the language of your company’s industrial data. Generative AI Large Language Models (LLMs) were trained on a large body of text-based sources where the context required for training is in the grammar and structure of the text. With industrial data, that context is often absent due to the nature and diversity of the content, which is why AI “hallucinations” are most likely to occur when a user tries to ask a domain-specific question. To avoid this, users must input clean, quality data. As more people input this data, the outputs will be more precise, allowing organizations to predict and mitigate issues ahead of time. Staff from across the enterprise will have seamless visibility into their operations, empowering them to instead spend valuable time fixing failures before they happen, increasing production times and generating new solutions.”
New data pact for EU/US companies. Commentary by Eliott Behar, a privacy and ethics attorney for Inrupt
“An obvious point, but an important one: moving data across borders is essential for the Internet. We want and need data to move freely, and I certainly don’t favour a world in which data needs to be “localized” and siloed within individual jurisdictions. This means we need to figure out real, practical solutions to the “cross-border data transfer” problem, and put an end to the uncertainty that’s been caused by all the back and forth of this legal process.
For businesses that need to operate within this environment (most global companies, really), it doesn’t look like this uncertainty is going to be resolved anytime soon. These businesses should be thinking long-term about where their data lives, how and for what purposes they transfer it, and about implementing better practical systems to control how their data moves. Long term, giving users real control over their data use is a much better and more sustainable approach.”
Internal GPT Models Can Reframe Company Data Privacy. Commentary by Michael Shehab, Labs Technology and Innovation Leader at PwC
“As AI continues to grow in everyday use, it’s important for companies to equip employees with the tools needed to safely handle company data while using AI to assist with their work. Although public and free AI chatbots can be great resources to support employees, they can raise a number of security concerns if employees are plugging sensitive or confidential company data into the tools.
One way companies can protect data is to deploy an internal GPT model. Internal GPT models can be built in a secure environment to protect all inputted data to ensure no private company information is fed into public AI models. This allows organizations to reap the benefits of AI in a responsible way, while having more control over not only where their data is stored, but how their employees are using the tool in their jobs and if it adheres to their governance framework.”
AI Sets the Stage to Augment Work. Commentary by Artem Kroupenev, VP of Strategy at Augury
“Sensationalized headlines have caused individuals across the global workforce to worry that AI will steal their jobs and render them obsolete. This is far from true. Instead, AI will reimagine existing roles in incredible ways. Luckily, recent survey data reveals that such AI misconceptions are not rampant among manufacturing leaders. Instead, industrial executives are wholeheartedly embracing AI. Most manufacturing leaders (80%) are confident that technology will help upskill their workforces. Furthermore, as AI insights continue to drive a newfound interconnectedness of human and machine, 63% of manufacturing leaders plan to increase their AI budgets this year.”
Where LLMs Make the Most Sense in the Enterprise. Commentary by Conor Jensen, Field CDO, Dataiku
“When all you have is a hammer, everything looks like a nail. While an impressive technology, LLMs need to be viewed as one tool in the spectrum of analytics techniques that can be applied. When looking at where to apply analytics, organizations need to start from their existing business strategy and carefully examine where to apply all the various techniques that data science has to offer. There are hundreds of “classical” machine use cases, such as forecasting, classifications or segmentations, which have comparatively simple approaches to solving. This will allow customers to move faster and create solutions that are more easily understood by end users, customers, regulators, and others.
In the early days of LLMs at a company, it’s important to focus on internal use cases while they explore and understand how to use them. Additionally, it’s crucial to ensure they’re looking at processes where a human is in the loop and the problem space is understood well enough for users to evaluate the output of the models.”
AI: Why its implementation and continued refinement across SMBs is crucial for long-term success. Commentary by Matt Bentley, Chief Data Scientist, Scorpion
“Generative AI has great potential to be a game-changer for small businesses, empowering productivity, allowing for more streamlined operations, boosting creativity, and enhancing customer engagement. Additionally, AI’s ability to support hyper personalized marketing campaigns and relay product recommendations drives a new standard for customer satisfaction and loyalty. Innovative tech platforms are already capable of leveraging consumer data so AI can be integrated to most effectively meet customer demands.
Over the past few decades, the explosion in marketing channels and need to keep pumping out fresh content for search and social has made it difficult for small businesses to keep up – this is further exacerbated by staffing shortages, macroeconomic pressures and increased competition. Generative AI helps to fill this gap, allowing humans to upskill and leverage their knowledge and expertise more frequently and in bigger ways rather than manually managing marketing channels.
As we move forward in a world where generative AI allows businesses to thrive, it’s important to consider how leveraging AI will facilitate further innovation – enabling small businesses to maintain a competitive edge.”
AI innovation should be approached with cautious optimism rather than fear. Commentary by Anita Schjøll Abildgaard, CEO and Co-Founder of Iris.ai
“The recent preponderance of ‘AI doom’ narratives has fuelled unnecessary fears over a dystopian future. It is reassuring to see a diverse group of industry professionals come together to emphasise the positive impact of AI instead of feeding into alarmist narratives.
The reality is that AI has already shown significant potential in improving our lives. This open letter details important use cases across diverse fields such as agriculture and healthcare, and generative AI is already seeing use in tackling public health issues like avian influenza. Moreover, AI’s capacity to analyse and extract insight from vast amounts of data has huge potential to democratise access to scientific understanding.
We should approach innovation with cautious optimism rather than fear. AI stands to greatly enhance our collective knowledge, productivity, and well-being. It is important that the public conversation recognises the tremendous good these technologies are already capable of and the promise their future holds – when and if we apply them to the real problems of the world, with empathy and caution.”
Gartner forecast on IT spend. Commentary by Perry Krug, Head of Developer Experience, Couchbase
“It’s no surprise that Gartner sees a particular focus on optimisation as the driving force behind IT spend increases. Prioritizing IT projects and budgets to drive efficiency makes sense given the current economic climate – 60% of enterprises in our research confirmed their key modernization goal for the next year is to improve business resilience.
But extra budget does not mean guaranteed success. In fact, enterprises report issues such as a lack of buy-in within the business, inability to stay within budgets and a reliance on legacy technology contribute to significant delays, and even failure, with transformation projects – absorbing 14% of overall budgets. This could mean that up to $658 billion is being spent on doomed efforts.
These issues, risks and potential wasted investment make clear the need for the right approach to digital transformation. Organizations must prioritize access to modern technology that can easily handle the data needed to drive new applications and services, and ensure the ROI of their new projects while managing them effectively. This will greatly reduce the risk that projects fail, or don’t happen at all—and ensure the business does not suffer the consequences over the next year. Without this, CIOs will struggle to achieve the efficiency and optimization earmarked for IT projects.”
We live in a data everywhere world. What does it mean for storage? Commentary by Kiran Bhageshpur, CTO at Qumulo
“We are entering an era where data is everywhere – edge, core, and cloud. Not just where it is stored, but where it is accessed as well. Legacy storage and cloud storage platforms and were never designed to handle a “data everywhere” world, so they’ve added complex layers to handle the problem. For example, legacy storage vendors deliver ‘cloud attached’ solutions provide cloud storage in name only; they fail at scalability, reliability, and efficiency. These approaches don’t work in a data everywhere world.
Organizations need to be able to store, manage, and curate their data anywhere so that organizations can scale anywhere. The trick is doing this in a data everywhere world. It requires a software-only approach with no hardware dependencies. It means being able to deploy the same code base whether at the edge, core, or in the cloud.”
Stolen OpenAI Credentials, Thousands Sold on Dark Web. Commentary by Philipp Pointner, Chief of Digital Identity at Jumio
“With the rise of generative AI, it is no surprise that credentials for generative AI tools and chatbots are a sought-after form of data. This incident brings attention to the rising security concerns that GPT technology brings. With over 200,000 OpenAI credentials up for grabs on the dark web, cybercriminals can easily get their hands on other personal information like phone numbers, physical addresses and credit card numbers. Generative AI chatbots also bring an additional concern. With these credentials, fraudsters can gain access to all types of information users have inputted into the chatbot, such as content from their previous conversations, and use it to create hyper-customized phishing scams to increase their credibility and effectiveness.
Now more than ever, with the rising popularity of generative AI chatbots, organizations must implement more robust and sophisticated forms of security, such as digital identity verification tools that confirm every user is who they claim to be. By establishing every user’s true identity, businesses everywhere can ensure the user accessing or using an account is authorized and not a fraudster. On the consumer side, users should be more wary of the type of sensitive information they are sharing with online chatbots.”
A “data-first” approach to AI. Commentary by Bjorn Andersson, Senior Director, Global Digital Innovation Marketing and Strategy at Hitachi Vantara
“Developers are increasingly integrating generative AI and AI-driven analytics into external-facing and internal applications across industries, for both SMBs and enterprises. Applications with AI capabilities can offer profound new efficiencies and insights that will change the way we live and work. But the unprecedented masses of data that feed AI models and the thousands of GPUs that ‘run the engines’ require new and innovative kinds of data strategy, data management, and data center sustainability solutions. For AI to be fully appreciated and ethical, the complexities of data management must be fully appreciated and addressed.”
Using Web Agents to Derive Insights from Streaming Data. Commentary by Chris Sachs, CTO, Nstream
“Streaming data is information continuously generated from many sources, and businesses are increasingly choosing it over static data because of the real-time insights it provides. However, traditional data streaming architectures are complex and involve multiple data systems. Storing large volumes of raw stream logs demands considerable bandwidth, not to mention the difficult and expensive task of processing and running analytics and automation on massive amounts of data. As a result, many companies get stuck relying on stale data or predictive models to execute business use cases.
For businesses to derive meaning from streaming data and automate real-time responses, they need web agents. Web agents are web-addressable stateful objects like actors in an actor system. They are stateful because they preserve data and use context locally between operations, which keeps latency low since they do not need to wait on database round trips. Web agents tasked with business-critical questions will continuously compute their own contextual KPIs (Key Performance Indicators), proactively inform users about what they need to know, and take autonomous remediating action. They are the next frontier in utilizing streaming data.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW