Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Four key data security considerations needed for safe AI/LLM implementation. Commentary by Joe Regensburger VP, Research Engineering at Immuta
“Similar to how search engines enabled broader utilization of the internet, generative AI is lowering the barrier of entry to self-service analytics for users, enabling them to extract value from data without core competency or domain expertise in data querying and manipulation. For example, AI can significantly streamline data discovery (discovering and classifying sensitive data) and data fusion (linking information across many different systems and classifications) processes. Conversely, generative AI also poses risk, potentially impacting people’s rights, safety, and livelihood, if models are not properly trained and governed.
Data security must be considered as a foundational component within any AI or LLM implementation to reap its benefits without compromising security. Specifically, the four “whats” and “hows” – what data is being used to train the LLM, how is the LLM being trained, what controls exist on deployed LLMs, and how can we assess the veracity of outputs – provide a good framework for how to ensure data security is core to your implementation. These four processes are associated with one another and are critical components of the data security lifecycle: discovery of sensitive data, detection of how data is being utilized, securing access to needed users and purposes, and monitoring how controls protect sensitive information.
When training the model, it’s important to discover what sensitive data could be used to develop the model and ensure it is sanitized. From there, you can detect how data is combined in the model. Otherwise, data may appear innocuous until combined with other information, posing reidentification risks, especially around privacy. Then to reduce the impact of induced sensitivity during model training, train the model with noise injected in the process, preferably with mathematical guarantees like Differential Privacy. Still, even with controlling the data and training of models, the model itself needs to be secured. By understanding what the model can and should be used for, you can motivate what data goes into the model, and how that data should be sanitized. Finally, assessing the veracity of the output of AI results is a critical component and one that could have detrimental impacts on society through the spread of misinformation if not addressed at the onset. Access controls can help address this challenge as they provide capabilities to set the model’s intended scope and restrict activities that push on the edges of the defined scope.”
Generative AI, Automation and the Post Modern Data Stack. Commentary by Sean Knapp, Founder & CEO, Ascend.io
Generative AI is transforming many areas of work, but perhaps none more so than software development and data engineering. Key players like GitHub, Databricks and Snowflake are investing heavily in this area, and generative AI-related job postings continue to soar throughout the U.S.
Much of the conversation has focused on feeding data into AI systems — but what about the role AI can play in improving how we build new data products? Generative AI opens the door to wider use of automation in the building and management of data pipelines, and allows many more developers and analysts to take part in data engineering.
The impact of AI on the Modern Data Stack can be broadly categorized into four areas: (i) Data team democratization: Data engineering will be brought within reach to a broader group of “citizen data engineers;” (ii) Improved productivity: AI will do the work that, let’s admit, none of us want to do themselves. Few if any wake up each morning eager to write tests and documentation; (iii) Consolidation & tighter integration: The power of GenAI is rooted in data, which helps drive exponential value the more consolidated a stack or tightly coupled a company’s systems can become; (iv) Embracing automation: As AI drives greater productivity, data teams will increasingly leverage the value of its closest counterpart, automation.
To accelerate the arrival of this “post modern data stack,” we should empower data teams to embrace automation and AI. As these technologies become more integrated into data engineering workflows, the role of data engineers will focus on more complex, high-level tasks while AI handles the routine work. But while AI will eliminate significant amounts of busy work, it will still require a human in the loop to oversee work and verify it is done right.
The impact of AI on the workforce and data engineering teams specifically will be profound, and to capitalize on it we must be willing to adapt and evolve with the times. Some of the changes may be challenging, but the rewards are too great to ignore.
The Essential Role of Human Developers in Generative AI. Commentary by Ev Kontsevoy, CEO & Co-Founder, Teleport
“Today’s modern cloud environments are complex, consisting of thousands of servers and other cloud resources, such as databases, Kubernetes clusters, CI/CD and observability tools, and so on. Managing these resources creates the need for building context, increasing the overhead of context switching for DevOps engineers. This is similar to application developers operating on complex codebases, where generative AI tools are rapidly gaining popularity.
However, infrastructure is just as critical than code. For this reason, using generative AI tools for managing infrastructure should not be done without human supervision. Without the human in the loop, the risk of an AI hallucination issuing commands to thousands of servers is just irresponsible.”
Responsible AI is the natural evolution. Commentary by Infobip CBO Ivan Ostojic
“Regulation often lags technology as tech development cycles move extraordinarily fast. The collaboration of tech giants like Microsoft, Meta, OpenAI, and Google in forming a coalition around responsible AI is the natural evolution as we deal with the speed of AI.
By prioritizing responsible behavior and setting ethical guardrails for AI adoption, tech leaders are balancing the inherent risks while paving a path forward to progress and ultimately embracing its benefits for society. This proactive approach is key to cultivating a culture of responsible AI practices, fostering innovation, and safeguarding the well-being of individuals and communities.
As this coalition expands, it is crucial for others to actively participate to strengthen the collective efforts. At Infobip, we fully support this initiative and are committed to offering our expertise, support and resources to champion the cause of responsible AI.”
Unlocking business success in today’s digital age with quality data. Commentary by Mike Albritton, SVP of Cloud, CData
“We’re living in a time where more and more business leaders are realizing the potential of data as a strategic asset. Macroeconomic factors such as evolving market dynamics, shifting regulatory landscapes, and technological advancements catalyze this shift toward prioritizing data. When business leaders invest in driving data quality best practices across their organization, it significantly pays off by driving accurate insights, informed decisions, and business success. As data explodes in volume, however, it becomes more difficult to regulate the quality of data being produced and analyzed. These modern challenges beget thoughtful reconsideration of data management and strategy within organizations.
Pivoting to a truly data-driven culture allows organizations to react quickly to market changes, make informed decisions, and maintain a competitive edge. Instant access to data for reporting and analysis, for example, is becoming a necessity for organizations as operations become more digitized. But the value of these metrics is compromised if the data quality is poor. To support these growing requests for data, organizations need to rethink their data strategies with a focus on enhanced reliability, accessibility, and overall quality of the data. It is a critical step towards unlocking the full potential of data and making it a driving force for organizational success.”
Zoom News Raises Questions About Who Really Owns Your Data. Commentary by Shiva Nathan, Founder & CEO of Onymos
“This is not just a problem with Zoom. Zoom is the model for what is wrong with SaaS today. They rent you software with their right hand and take your data with their left. SaaS is broken. Ford is the harbinger of the pendulum swinging from SaaS back to in-house. One company controlling and customizing its own software creates better experiences for that company and its end users. The problem is it loses out on what made SaaS so appealing in the first place — speed and cost. We need to throw out the broken SaaS model and replace it.”
The AI playing field must be leveled. Commentary by Sherard Griffin, Senior Director of Engineering, Artificial Intelligence Services at Red Hat
“Companies are rushing to adopt AI at alarming rates. It’s the new competitive advantage; the new measurement by which companies can stand out from the crowd. We’re seeing this in every industry and every line of business, and there doesn’t seem to be any signs of slowing down. But not every enterprise is equipped with the petabytes of data, tools and infrastructure to train LLMs and other generative AI models with billions of parameters. The next wave of corporate winners shouldn’t be decided on who has access to the largest amount of privately stored data and the most expansive data centers.
We’ve seen this challenge before, and it gave rise to open source software. Suddenly companies who had traditionally sat on the innovation sidelines could now be a part of something greater and compete in new ways. It allowed the Davids of the world to compete with the Goliaths. Now is the time to adopt the same methodology for generative AI.
We need companies large and small to default to open source collaboration. This should be for the models, the data they’re trained on, and the tools used to build AI-infused applications. It’s the right choice given all of the open questions AI faces around governance, ethics, privacy and compliance. Open source is now seen as some of the most secure software available, which is why it runs the vast majority of data center servers in the world. If we want democratic access to secure, governable, and ethical AI then a new open source methodology for AI is the only true answer. Anything other than this could result in lasting negative economic effects.”
New Gartner survey shows Generative AI is now Emerging Risk. Commentary by Mike Myer, CEO of conversational AI platform Quiq
“AI is super powerful and a huge step forward for enterprise business use, but ChatGPT is not the way to get there. If you wouldn’t post corporate information on the internet, let alone share private information on the internet, then you also shouldn’t be entering that information into ChatGPT. A company operating with a laissez-faire approach to ChatGPT is the most brain-damaged idea ever.
Gartner’s latest Quarterly Emerging Risk Report shows that employees’ use of public AI websites such as ChatGPT is the source behind the data privacy, intellectual property, and cybersecurity fears that have executives worried. When something is free it is a good idea to consider why that is. In the case of most AI portals, they exist to collect training data – something that no CIO wants to have their confidential data become part of. The good news is that this fear can be addressed without having to ditch AI altogether.
By working with a reputable vendor who has the appropriate policies and procedures to source AI and compliances to validate the data security of AI (such as SOC 2 or ISO 27001), corporations can use AI with confidential information. In Quiq’s case, we have commercial agreements with LLM vendors that restrict the storage and use of our clients’ information for training purposes and our SOC 2 demonstrates that data is being handled appropriately.
Drilling down further, if the AI is going to be used to answer employees’ or customers’ questions, further precaution must be taken to ensure that the AI only provides answers based upon company information and doesn’t use the publicly available, outdated information that the LLM was trained upon or, worse yet, make up answers or hallucinate.“
Why adopting an integrated observable resilience approach will save developer stress while improving the customer experience. Commentary by Daniel Furman, Distinguished Engineer at Capital One
“An influx of raw data and opportunities to enrich it over the past few years has made it easier than ever to transform it into vital information, often made even faster with automated and serverless software solutions and services. But while this acceleration has enabled more tools and experiences than ever before, it’s also creating overhead and timeline expectations for developers – resulting in missed connections with customers. More applications, especially built to automatically scale, often require significant insights to operate and maintain, increasing the chance for application errors that erode the customer experience.
Observable resilience is a mindset that combines two tried-and-true theories: your team’s ability to easily see and diagnose issues, and your ability to respond quickly and mitigate further risk. While these individual practices are often found in developer work, too often they’re approached separately. As our data assets and clouds become more complex, in tandem, customer experience expectations continue to rise. When we combine these practices and integrate an observable resilience mindset from the start, teams move from trying to react to these modern signals to being driven by them.
At Capital One, we’ve adopted this mindset across many development teams. By integrating both what and how we want to observe the target, teams iterate development design in the ability to diagnose an issue within any targeted dataset. For customers, this eliminates the need to take more systems offline, instead support teams can more easily identify and quickly mitigate the problem, with intelligent automated recovery. For example, an observable resilience approach allows us to see when a system request took several attempts and instantly identify the specific customer, data set or process that was impacted, and then address the challenge. In the past, we’ve had to manage these steps in different pools which drained developer time, and introduced risk that more customers would face impact before it could be resolved. Now, by leveraging the signals built into serverless applications, applying an integrated observable resilience approach can be self-healing. This will drive feature velocity, lower maintenance overhead, and simultaneously improve the experience for both developers and customers.”
Why AI Isn’t Taking Your Job, It’s Helping It. Commentary by Claus Jepsen, CTO of Unit4
“The rise of AI in the workforce has led to uncertainty and anxiety among Americans nationwide, with the vast assumption the new technology is here to threaten jobs. However, implementing AI into the workplace is the inevitable future that should evoke hope, not fear.
There’s been a fundamental change in company structure based on recognizing the importance of time and how it can be better spent to advance business goals. Organizations are looking to add more than just digital advancements and are also reevaluating the pathways for people and processes simultaneously. Post-pandemic, businesses and IT leaders alike are acknowledging that a people-led approach in enterprise applications is the most valuable one for business success moving forward.
This strategic shift in thinking presents a new idea, that digital transformation is made for people, not for technology. So, organizations are concentrating on experience-focused transformation, which is an approach aimed to gear technological strategies toward creating a simpler environment for employees, thus enabling them to focus their efforts on more enriching work. AI tools are crucial to this transformation as they can automate tedious processes that used to take up large time blocks and add to employee workloads.
Since COVID, the digital era has advanced tenfold, but all paths lead back to the people themselves. The positives from AI implementation range from new functionality in administrative assignments to heightened productivity of employees. Perceiving the technological future can be stressful for many employees, but the commonality both decision-makers and workers can agree on is that AI is here to aid our companies, not threaten our positions.”
Orchestrating unstructured data holds the key to winning in the Next Data Cycle. Commentary by David Flynn, Co-founder and CEO of Hammerspace
“Globally distributed data makes it difficult for enterprises to really understand what they have, but more importantly to find and extract the value held within their digital assets. Instead of an asset that fully contributes value to the enterprise, under-utilized unstructured data becomes a rapidly expanding cost center. The last data cycle focused on structured data rooted in business intelligence, but this data cycle, driven by compute, orchestration and applications, leverages unstructured data to drive product innovation and business opportunity with AI, machine learning and analytics. The enterprises and public sector organizations that orchestrate these large amounts of unstructured data to algorithms will transform their businesses and ultimately win the race to reach the Next Data Cycle.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW