Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
How Do We Bring Light to Dark Data? Commentary by: Dale Lutz, Co-Founder & Co-CEO of Safe Software
“The world has never collected more data than right now, with 90% of all data created in the last two years alone. Given the sheer volume of data being collected by billions of people every second, it’s becoming overwhelming for organizations to manage and make the most of it. Data organizations are collecting but not using is called ‘dark data’ – and makes up the majority of data enterprises collect.
Dark data has the potential to be transformed into incredibly useful information for enterprises. As we gain a deeper understanding of AI, we could be on the precipice of an exciting new frontier in the data economy. For example, emerging technology could filter and/or aggregate huge data volumes to provide value through more actionable and analyzable datasets. It could further have the potential to find patterns in dark data that would typically be ignored by organizations. For enterprises, this could include finding new markets, identifying outliers that foretell important risks or opportunities, assessing equipment failure potential, targeting potential customers, and/or preparing training data for machine learning and artificial intelligence use. Modern integration approaches can further extend the utility of otherwise dark data by joining it to other datasets, with a result that the whole is far more valuable than the sum of the parts.
It’s an exciting time in the data industry as new technologies like AI and modern data integration approaches hold the potential to shine light onto the underused and undervalued underside of the data iceberg.”
Data Science, Discipline, and Decision-Making: The Three D’s of Investment Management. Commentary by Paul Fahey, Head of Investment Data Science, Asset Servicing, Americas at Northern Trust, and Greg McCall, President and Co-Founder of Equity Data Science (EDS)
“Performance data for investment managers is available for their clients, and even the public, to see on a daily, quarterly or longer-term basis. When it comes to making investment decisions, however, asset managers are looking to data sources that are not so easy to find, to generate insights and gain an edge on the competition.
A 2023 survey of 150 global asset managers by Northern Trust and Coalition Greenwich found that managers are focusing on more quantifiable/disciplined investment processes as a key avenue to achieving alpha. In line with this approach comes a focus on data management, as many investment teams face challenges effectively managing their data to make better decisions. While Excel spreadsheets and Word documents have been go-to tools for decades (and still need to play a continued role), they lack advancements in workflow integration, analytics, and data management. As a result, investment teams often resort to fragmented workflows and storing critical intelligence in various systems such as Google Drive, Outlook, or Evernote. This decentralization leads to inefficiencies, increased risk, limited collaboration and missed opportunities.
This is where data science comes into play. Investment data science allows the consumption of large data sets from multiple providers and sources through cloud-based technology and enables investment teams to interrogate their data to gain meaningful insights. This can go beyond number-crunching of market and reference data to incorporate the manager’s proprietary data around investment process management – trading patterns, analyst research, buy-and-sell discipline, macro strategy and other information often stored in siloed or disparate locations.
While the computing power needed for data science has historically been available only to the largest asset managers, new cloud-based tools are democratizing the application of data science to the investment process for a broader audience. Small and mid-sized asset managers now have cost-effective access to deeper analytics, ensuring they can compete on investment expertise and not on the ability to invest heavily in technology. These platforms can enhance the investment process by bringing data into a central ecosystem, allowing for greater collaboration and accountability. From pre-trade to post-trade, data science can unlock insights into a manager’s decision-making, providing a holistic view of their processes. With each insight, managers can develop a more quantifiable, disciplined investment approach, giving them an edge in the ongoing battle for alpha.”
Why the “AI arms race” doesn’t exist. Commentary by Tal Shaked, Chief Machine Learning Fellow at Moloco
“The ‘AI arms race’ is all over the headlines in reference to the technologies being developed by companies like Google, Amazon, Apple, and more. This is not only a misnomer, it is doing a disservice to the public and the government – both trying to understand the technology’s capabilities and impact.
By definition, artificial intelligence draws a comparison to human intelligence, but the fact is, computers are wired differently from humans, and therefore the intelligence they display that is enabled by ML isn’t exactly “artificial.” Rather, it is a different kind of intelligence, machine intelligence, that is uniquely enabled by nearly infinite storage and compute in contrast to humans. The most valuable companies in the world have been innovating with machine intelligence for more than 20 years to develop better ways to interface with humans to “organize the world’s information”, “be Earth’s most customer-centric company”, and “bringing the best user experience to customers.” Advances in ML are enabling new types of “machine intelligence” that are fueling innovations for the world’s most valuable companies today as well as those to come. Leaders and businesses should be racing to build the best teams that understand and can leverage these technologies to build new products that will disrupt every business area.”
Thoughts on string of recent data breaches. Commentary by Zach Capers, Senior Analyst at Capterra and Gartner
“Data breaches are a top concern for data security teams given their financial and reputational ramifications. As evidenced by the breaches of Tesla and Discord, businesses must be aware of threats stemming from human factors. In these cases, it took a pair of disgruntled employees and one compromised customer support account to put the sensitive information of thousands at risk.
A robust data classification program helps organizations avoid costly breaches that put sensitive data at risk. But the process of identifying and labeling various types of data is often overwhelming—and overengineered. Businesses should focus on implementing three fundamental levels of data classification, if possible, leveraging automation for data management over manual methods, and prioritizing security over compliance.”
Thoughts on string of recent data breaches. Commentary by Nikhil Girdhar, Senior Director of Data Security, Securiti
“The recent data breach involving Johnson & Johnson’s CarePath application underscores the pressing need for a tactical overhaul in healthcare data security. As the sector moves swiftly towards digitization, patient data becomes a prized asset for cybercriminals. This mandates a critical reassessment of Data Security Posture Management (DSPM) strategies across healthcare organizations.
In an environment where patient data is dispersed across multiple platforms, the challenge for security teams—often operating with finite resources—is to effectively pinpoint and secure vulnerable assets. A data-centric approach can optimize resource allocation by focusing on high-value assets. This enables more precise application of safeguards such as least-privilege access controls, data masking, and configuration management, particularly for key applications like Carepath.
The paradigm must also shift from an ‘if’ to a ‘when’ mindset regarding breaches. Prioritizing data encryption is not just advisable; it’s essential. Moreover, automating incident analysis can accelerate notifications to impacted parties, enabling them to take proactive measures to protect their information. When integrated, these steps forge a formidable defense against increasingly advanced cyber threats, offering security teams the tactical advantage they need.”
AI and Synthetic Content. Commentary by Tiago Cardoso, product manager at Hyland Software
“AI models training synthetic content already happens in many cases, replacing human feedback to scale training and fine-tuning, as there is no need for people to be in the loop. It is mostly used on smaller language models to improve performance and the main implication is allowing low-cost generative models that have high performance.
Nevertheless, using synthetic content might lead to biased inflation. The bias of the model producing the content will be amplified the more this content is used to train new models or to improve the original model. This might lead to noisy generative AI that are less aligned with what would be expected from a human to create, eventually raising the hallucination levels – in short, a scenario where a model produces something that sounds plausible but is actually untrue – in some specific pockets of knowledge.
Any work done now to centralize, automate, improve and control human-aggregated data only will save pain later for organizations who have designs on incorporating generative AI into their future plans. Given that these data are playing a role in future AI models, organizations will reap the rewards of any preparation they do for those models now by determining what information is most important, analyzing those data for potential biases, and preparing it for training by AI models.
Organizations should embrace content services platforms that provide a central place to collect and store human-aggregated data without silos and with the potential of leveraging all organizational data in a manageable way. Also, investing in technologies like a data lakehouse would be ideal. Doing so as soon as possible will allow them to be prepared to build a proper AI strategy, specifically generative AI. Human-aggregated data are required to produce organizational-aligned AI, which is the best way to scale their value. In practice, this is a new way to monetize human content, and therefore, benefit comes directly from the volume of manageable human data.”
More businesses will need to tap the 80/20 rule when it comes to finding the value in their data. Commentary by Dennis DeGregor, VP, Global Experience Data Practice, Ogilvy/Verticurl
“In 2023, customer data is an abundant resource. However, volume is not synonymous with quality, and bad data can be just as destructive as no data at all. That’s why acquiring the necessary first-party and zero-party data is the table stakes for effective digital-first marketing. At the same time, the Pareto Principle, or the 80/20 rule, should inform a brand’s approach to data collection and aggregation, understanding that 80 percent of data’s value often comes from just 20 percent of the volume. This is the key to unlocking data-driven personalization.“
Intelligent Reasoning. Commentary by Patrick Dougherty, CTO and co-founder at Rasgo
“There’s a key capability of foundational large language models that we’ve only scratched the surface of so far: intelligent reasoning. The “Copilot” approach to enabling knowledge workers with AI isn’t just for automation: we’re seeing our users become better data analysts, with less human bias in their work, because of the back and forth they can have with the large language model. This spans the entire workflow of analyzing data: finding the right tables; making a plan for how to analyze them; writing the code; and finally, explaining findings derived from the data. Imagine having a creative, unbiased assistant that never gets tired of your bad ideas and creatively problem solves to turn them into good ones. The companies getting the most advantage from AI today aren’t replacing their employees, they’re upleveling their performance and productivity with a Copilot tuned for their job.”
Why real-time data is a necessity to ensure future financial department success. Commentary by Lee An Schommer, Chief Product Officer at insightsoftware
“The CFO and finance team are responsible for communicating metrics and other financial information accurately and effectively to a range of stakeholders. With real-time data, these teams can not only be assured that they are presenting the most accurate data to their stakeholders, but they can also run predictive analytics based on what is happening in the economy that day, week, month, or even year. Ultimately, using tools that aggregate real-time data will allow for more accurate, timely, and agile reporting, enabling organizational decision-makers to make informed decisions based on the most current information.”
Increased Accessibility of Foundational Models Will Reinvent the Analytics Lifecycle. Commentary by Nicolás Àvila, Chief Technology Officer for North America, Globant
“Many companies are collecting massive amounts of data from a variety of sources without really knowing how to understand and derive value from it. Large language models (LLMs) are the tool they’ve been waiting for to help them benefit from their data.
LLMs are increasing in prevalence and accessibility, and are being used in a variety of ways to help make businesses’ data engineering and analytics processes more valuable, productive, and efficient. For example, these tools can create synthetic data, generate complex SQL statements, assist with data migration from on-prem relational platforms to cloud-based distributed frameworks, and even help create relevant and domain-specific training material for talent development. Generally, these tools are extremely useful in helping non-technical business users to more easily understand, access, and derive meaningful insights from data. Right now, these foundational models are continuing to develop and become more accessible. This accessibility will reinvent the analytics lifecycle as we know it, shrinking the time it takes businesses to derive insights from data and removing the need for intermediaries to help users make sense of it. This makes them a game changer for businesses, allowing them to make better decisions, improve operations, innovate more quickly, and boost stakeholder engagement at a scale and pace we’ve never seen before.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW