Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
ChatGPT & LLM Tools Amid Rapid Transformation. Commentary by Ali Siddiqui, Chief Product Officer at BMC Software
Today’s chatbot deployments require initial and ongoing training in order for the natural language processing (NLP) models to understand the user’s intents and extract ‘entities’. Large language model (LLM) AI tools — such as ChatGPT, Macaw, and others — offer a ‘zero-shot’ approach where there’s minimal to no training data needed to achieve the same classification result. However, LLMs carry gigantic amounts of data compared to a mid-sized enterprise’s knowledge base. So, in some cases, we still need to override the general data to deliver customer-specific responses. It’s all about the quality and freshness of the data the model has to work from. Other enterprise LLMs are available now, and we are seeing new use cases such as extracting useful information from logs and tickets to answer user queries. For example, a service desk agent can use an enterprise LLM to extract different aspects (perspectives) of a ticket activity log or a chat log — what was the issue? What was the root cause? What was the resolution? — and assemble it as a RCA document. Users can also get concise, to the point ‘answers’ to their search queries instead of being given a set of URLs. Wouldn’t it be amazing that now you can “chat with your knowledge base” or “have a conversation with your enterprise data”? This is all possible because we can configure and prioritise the LLM to return answers from a well-maintained enterprise knowledge base. As we focus on the future where enterprises can adapt to changes and thrive amid rapid transformation, LLMs are one of the tools at our disposal, and we are actively experimenting with them to reliably, securely, and transparently deliver accurate answers to employees.
Are we ready for GA4 Migration? Commentary by Dan LeBlanc, CEO of, Daasity
As of right now, the industry is not ready for the GA4 migration. Only a small percentage of our merchants have been asking how they can hook up their GA4 data, and about half the ones who have done so have huge problems that need to be fixed. Native GA4 integration built by other merchants will be a huge help for implementation, but there will still be a lot to clean up once tracking is set up.
AI needs to be regulated on an industry-by-industry basis. Commentary by Parry Malm, CEO at Phrasee
AI absolutely needs to be regulated on an industry-by-industry basis, because there are degrees of consequences we’re dealing with here. If AI generates a bit of content that just isn’t very good, and nobody clicks on it, it’s not that big of a deal. But if you’re giving medical advice and it hallucinates a fact, or gets it wrong, that’s high stakes. Given today’s news on the UK government’s careful stance on regulation, I think the EU is going to be the first to bring out AI legislation. They’re already working on it. It may be cumbersome and bureaucratic, but it’ll be broadly effective and will set a standard. In the US, passing any sort of regulations these days is rather difficult. I think that the regulations in the US are going to be de facto regulations from case law. ChatGPT is trained on the entire internet, so they’re taking in copyrighted material, and they’re producing derivative works. It produces verbatim text from a third-party source, and they’re monetizing it. So, on the back end, I think there are going to be some rude awakenings. There will likely be case law and breakups, and the US will regulate it through the courts, as it usually does.
Addressing data bias with data integrity. Commentary by Tendü Yoğurtçu, PhD, Chief Technology Officer, Precisely
Diversity, equity and inclusion (DEI) initiatives are fast becoming a top priority for organizations as they seek to drive better representation of talent across the business, but many are still missing a crucial element – the representation in their data. While we like to think of data as being something that’s factual, or even impartial, the truth is that human biases can create data biases too. This presents a major challenge that must be overcome, particularly as the use of AI and automation, fueled by potentially biased datasets, is on the rise. Afterall, AI models are a product of the data that they are trained on. This is creating a variety of real-world issues – from impaired facial recognition software that less accurately identifies women and people of color, to inequities in healthcare provision, and more. To address this, businesses need to ensure that AI and automation programs are being fueled with high-integrity data. Data integrity is built on the core pillars of enterprise-wide integration, accuracy and quality, location intelligence, and data enrichment. By leveraging these pillars organizations can ensure there is access to the right data, enrich it with trusted third-party datasets, and ensure it’s correctly prepared for use in intelligent models – ultimately allowing leaders to make better, more representative, decisions.
What DataOps can learn from the DevOps movement. Commentary by Matt Sweetnam, Chief Architect at AHEAD
With wide-scale shifts to digitization, businesses need digital speed to respond to the ever-evolving needs of stakeholders. Two processes have emerged as integral to the success of business in the digital age—DataOps and DevOps. A relatively new term, DataOps refers to a set of practices and principles aimed at improving the speed and efficiency of data analytics, processing and management. On the other hand, DevOps describes a set of practices aimed at improving the speed and efficiency of development cycles and deployment. While there are some differences between these two fields, there are many things one can learn from the other to unlock business value. At their core, both DataOps and DevOps are focused on improving the efficiency, reliability and scalability of data and development processes. They also work to break down traditional silos within the business and foster collaboration between different teams to meet common goals. Another key parallel is the increased emphasis on automation. Manual stare-and-compare methods still exist within business today and can severely limit the quality of data as it’s moved and manipulated across the enterprise. Leaning on experts to “bake in” automation into the CI/CD pipeline can lead to next-level success, and enhance the ability to use further AI/ML functions effectively. Overall, data management processes were once seen as a huge bottleneck, but now with DataOps emerging with the tools and strategies from dev teams, data leaders can now streamline processes while maintaining a high degree of trust in the data they leverage and business intelligence solutions they produce.
ChatGPT – Generation, Not Creation. Commentary by Neil Sahota, CEO – ACSI Labs
ChatGPT is a powerful, generative AI tool. The key word being generative. ChatGPT can only do what it has been taught and leverage the data it has been granted access to. Can it summarize and synthesize vast amounts of information? Yes, and we have seen this through book summaries and research synopses it has produced. Can ChatGPT apply processes, techniques, or frameworks that we have taught it? Yes, and we have seen this as ChatGPT has negotiated salary increases and produced cover letters for a job posting. Can ChatGPT be independently creative? No, it cannot because we do not know how to teach creativity. That is why ChatGPT is restricted from recommending stock picks when a person asks. This is why ChatGPT does not create. It does not have the capability for independent thought, crafting new perspectives, imaging the first-of-a-kind, or even being original. Generation, not creation, is a critical distinction to be made because some people have unrealistic expectations about what ChatGPT should be able to do. Tempering our expectations does two critical things for us. First, we can focus on the true value-add, feasible tools to assist (not replace) us in our work. Second, we will not limit what is innovative and possible because a pie-in-the-sky expectation was not met (and thus assume that ChatGPT cannot do much for us.) ChatGPT is not going to cure cancer for us; however, it collects patient information and summarizes key items so that healthcare professionals have a few more minutes to spend talking and treating the patient. So, let us leverage what ChatGPT was built for, generation, not creation.
ChatGPT + other generative AI could lead to serious privacy breaches. Lori Witzel, Director of Thought Leadership at TIBCO
The rapid commercialization and deployment of ChatGPT and similar generative AI continues to raise ethical questions, particularly with regards to data privacy. ChatGPT is trained on vast amounts of data – potentially including personal data. Without comprehensive data privacy laws in place for generative AI, this could lead to serious privacy breaches. It is critical for generative AI providers to equip generative AI tools with guardrails, such as tangible ethics policies and clearly outlined creator and intellectual property rights. And for users of generative AI, it’s equally important to have guardrails and legal review to reduce risk. Reducing ethical risks while developing or using generative AI starts with asking the right questions. When developing guardrails, organizations need to ask themselves and their teams: who’s not in the room and should be as we go further, who’s likely to be negatively impacted by this use of technology, and how do we protect ourselves, and our business, from breaking trust and violating ethical practices? The answers to these questions will lay the foundation for building comprehensive guardrails critical to preventing privacy issues and other ethical complications.
ChatGPT underscores importance of data quality & model training. Commentary by Tooba Durraze, VP of Product (AI and Data) at Qualified
People who are going to rush to implement gimmicky features – which are still really cool – are probably not going to be the ones who find actual utility out of it. The people who are more structured in terms of what they go out with or at least start to make long-term investments are going to be the ones to take advantage of this long term. AI can emulate a human no problem. It can answer a question like you and it can physically even look like you, but in order for it to be that good depends on how long you train the model and the data you use to train the model. Knowing that the technology is a lot further ahead than even the use cases we are talking about today is going to make a difference, especially for business leaders.
GPT-4 Open Letter: The Cat’s Already Out of The Bag. Commentary by Michaël Lakhal, Director of Product Management at OneSpan
The technology behind ChatGPT is not new but its open accessibility has made it much easier for the average consumer to leverage the platform. However, this has also made it easier for threat actors and hackers to utilize the technology for nefarious purposes. Hackers can utilize the technology to create more sophisticated phishing emails – mimicking brands and tone or more easily translating copy into several languages – making them more difficult to identify and easily connecting hackers with global audiences. Further, with ChatGPT, the average person with limited technical skills can become a hacker by asking the platform to write malicious code. While Open AI, the creators of ChatGPT, have put in safeguards to prevent the chatbot from answering these questions, it’s been proven that there are easy ways to work around this programming. Essentially, ChatGPT — in the wrong hands — could serve as a ‘how-to’ guide for potential and existing hackers, providing resources and pointers on how to improve and hone your skills. Though ChatGPT makes it easier to carry out attacks, these methods aren’t new so the solutions remain the same. For phishing attacks, business leaders need to set up clear policies, educate all employees, implement thorough and continuous authentication policies, set up anti-phishing alerts, and avoid sharing sensitive information over easily hacked mediums like email and SMS. Similar methods should be employed for malware as this malicious code is often delivered and distributed through methods such as phishing. While ChatGPT isn’t creating a brand new threat, it has the potential to drastically increase the number of malicious events and actors, so organizations need to remain extra vigilant in their ongoing security practices.
Turning dark, unused data into actionable insights is key for a successful business. Commentary by John Knieriemen, North America Business Lead at Exasol
Corporate data in its raw, vast form is fragmented and challenging. Extracting fully-formed insights is becoming routinely tricky to manage. One of the main challenges facing organizations today is their inability to harness data cohesively across the entire business. When data is used correctly, it gives businesses a clear competitive advantage. But still, more than 50% of U.S. companies’ data is dark, overlooked, forgotten about, or simply not used. Modern businesses seek the benefits of structured data analytics, but it’s often too out of reach. So, as companies continue to invest in their digital business initiatives, global IT spending is expected to reach $4.5 trillion in 2023, with the software and IT services segments projected to grow by 9.3%. The right processes and infrastructure are critical to meeting today’s data demands. Overall, the ability to grow as a business will depend on how fast your data insights can be extracted, whether in real-time or near real-time. Streamlining and simplifying your data management strategy is, in my opinion, the best way to become data-driven. A strategic approach helps prioritize what’s essential, minimizes business complexities, and sets the tone for your data management overall. That being said, utilizing the data to its full extent and having the proper database and analytics platform at the heart of your business, is critical to long-term success.
MLOps adoption trends. Commentary by Artem Kroupenev, VP of Strategy at Augury
Two clear trends of enterprise MLOps adoption is underway are companies set out to be more efficient and productive in light of market conditions. More initiatives will be led by a company’s CIO to prioritize organic integration, when previously the implementation of industry tools was led by maintenance and reliability teams. IT wants to enable data strategy for the organization, so more companies will approach vendors to help make everything work together, starting from how teams control data collected from facilities. For example, companies will be looking for guidance on how to create a new control room for their operations with the new insights that weren’t previously available. The need for increased productivity and efficiency amid a dynamic market is driving organizational change, so we can expect to see maintenance and reliability production teams reaching out to IT to connect industry tools into one unified system. The outdated way of accessing the same information across two different systems will be left behind as teams work smarter with consolidated data.
On AI and e-commerce cybersecurity. Commentary by G2A’s CEO, Bartosz Skwarczek
AI is used for both good and evil these days. Now that we are seeing an increase of artificial intelligence used in cyberattacks (for example: poison AI with inaccurate data), it only makes sense that the same technology should also be implemented for security purposes – especially in the e-commerce space. The use of AI in cybersecurity has many advantages over traditional security systems. Under economic uncertainty and with organizations being understaffed, teams can significantly boost their protections with AI to help supplement their security team’s work. Primarily, artificial intelligence is able to automate redundant, time-consuming tasks that burnout security professionals but can be done nearly instantaneously. AI is also improving e-commerce security by filling holes in an already tight labor market. The industry has been hit especially hard by the security talent shortage. Currently, the market does not have enough cybersecurity professionals specifically trained with e-commerce skillsets. The implementation of new AI-based tools, such as EDR, XDR and NDR which are responsible for detection of threats and response with SOAR (security orchestration, automation, and response), has significantly improved online marketplace cybersecurity.
AI Regulation: We’re in an IPhone Moment. Commentary by Joseph Toma, CEO of Jugo
To me, this is the iPhone moment for AI. Similar to blockchain or Web 3.0, AI allows individuals or corporations to operate at speed, at scale, in ways that have never been done before. Naturally, regulators have had to mobilize quickly, preparing official points of view on how AI should be regulated and managed. I will illustrate this with two examples: A marketing agency can now use voice AI and visual AI to create a life-like replica of any public figure and have that person say and do anything they choose without consent. Students can now use text AI to write 10,000 word essays within 30 seconds. The regulation required for voice and visual AI in the context of advertisement is different than regulation (and tooling) required in the education space. It requires deep industry subject matter expertise to understand how AI can be taken advantage of in each specific use case – which is why the UK Government has made the correct decision to empower their agencies to develop specific guidelines.
Will Generative AI replace jobs? Commentary by Shubham A. Mishra, Co-Founder & Global CEO, Pixis
Yes, it’s coming for our jobs, but not in the way everybody fears. This powerful piece of technology is at its nascent stage, so naturally there is a lot of apprehension about it. However it is also evolving by the day and overtime we will be able to ascertain with more clarity the degree to which it will influence our roles in the workplace. It is already helping optimize the creation, speed, and spends on our projects — but the real power lies in its ability to nudge us a bit out of our comfort zones and raise the bar for ourselves and our organizations. Companies that recognize this power and quickly begin exploring ways in which they can integrate this AI into everyday workflows are poised to win. It is not AI versus humans, but rather AI alongside humans. It is our responsibility to educate teams and leverage the technology to not only improve our own efficiency and output, but to also look at how our industries as a whole can be improved.
The use of data to enhance decision-making processes. Commentary by to Madhav Srinath, CEO and Founder of NexusLeap
Data-driven decision-making. We often hear about the importance of making decisions based on data, but what separates great companies from simply good ones is that truly great companies have propagated this core ethos into every team and function in their organization. The challenge for many organizations is that they have disparate systems that send data of varying quality back and forth, resulting in fragmented data lakes and other repositories filled with outdated information only the technical teams can decipher. Everyone hears the buzzword of being “data-driven”, they recall the latest decision they made using an Excel spreadsheet or Access database that they have saved on their shared drive, feel proud of their pragmatic mentality, and then move on without catalyzing any change. This is a shame, especially because there’s so much value yet to be unlocked in the refreshed, accurate, and connected datasets that they could have if they could autonomously build useful intelligence of their own. From a technical standpoint, the solution is simple yet difficult to execute at scale. Technology is advancing rapidly, and it’s important to avoid bottlenecking the entire organization by giving ownership of the entire company’s data to a single team. Instead, the central data team should evolve to become insight facilitators, providing expertise and support to other departments as necessary. Companies should move from a data centralization model to a decentralized paradigm where core functions of the business independently innovate with the data available to them and collaborate with other functions to drive business value. The biggest obstacle to adoption will be for these previously omniscient data teams to let go of ownership and instead take on the responsibility of facilitation. These teams need to set up processes and frameworks for the various business domains to create data contracts, encourage teamwork to increase the robustness of these contracts, and let each domain own its data. It will drive data-driven decision-making from the ground up, and the difference is it’ll stick this time because it’ll become a critical part of solving every business problem.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW