Monte Carlo released a report this week that found that data engineers spend 40% of their workday on average evaluating or checking data quality.
For its 2022 State of Data Quality Survey, Monte Carlo joined Wakefield Research in asking 300 data professionals about how many data quality incidents they experience, how long they spend detecting and resolving them, and how those incidents impact their business.
Results revealed that the average organization deals with nearly 61 data incidents per month with each requiring an average of 13 hours to identify and resolve, adding up to 793 hours per month. And those are just the known incidents, as proprietary data gleaned from the Monte Carlo platform indicates that for every thousand tables in a company’s data environment, about 70 incidents per year occur. To make matters worse, 58% said the total number of incidents has increased somewhat or greatly over the past year.
“In the mid-2010s, organizations were shocked to learn that their data scientists were spending about 60% of their time just getting data ready for analysis,” said Barr Moses, Monte Carlo CEO and co-founder. “Now, even with more mature data organizations and advanced stacks, data teams are still wasting 40% of their time troubleshooting data downtime. Not only is this wasting valuable engineering time, but it’s also costing precious revenue and diverting attention away from initiatives that move the needle for the business. These results validate that data reliability is one of the biggest and most urgent problems facing today’s data and analytics leaders. ”
In addition to the time costs of troubleshooting data quality issues, respondents reported that bad data impacts 26% of their business revenue. Some issues go undetected, and almost half of those surveyed said they measure data quality most often by the number of complaints they receive, an ad hoc method Monte Carlo says has possible reputation-damaging repercussions. For data quality issues that go undiscovered, 47% said that company decision makers or stakeholders face the impacts either all of the time or most of the time.
Some may feel that testing is the answer. The survey results show that respondents who performed at least three different types of data tests for distribution, schema, volume, null, or freshness anomalies at least once a week only dealt with 46 incidents on average compared to the 61 per month experienced by those with less stringent testing. Despite this, testing alone was shown to be inadequate and did not significantly correlate with reducing the business impact of bad data quality.
“Testing helps reduce data incidents, but no human being is capable of anticipating and writing a test for every way data pipelines can break. And if they could, it wouldn’t be possible to scale across their always changing environment,” said Lior Gavish, Monte Carlo CTO and co-founder. “Machine learning-powered anomaly monitoring and alerting through data observability can help teams close these coverage gaps and save data engineers’ time.”
Many companies are investing in solutions to their data quality problems. Monte Carlo’s survey found that 88% of those surveyed are currently investing or planning to invest in data quality solutions within the next six months. The company suggests that data observability is one data quality solution that can help. Monte Carlo claims that data teams at JetBlue, Vimeo, and Affirm are leveraging its end-to-end data observability platform to detect, resolve, and prevent data incidents which can lower data downtime. As an example, advertising software vendor Choozle reportedly used Monte Carlo to reduce its downtime by 88%.
The report also contains interesting insight into the lifestyle of data engineers, including their thoughts about remote work and landing a job with one of the tech giants. It also features commentary from its own data experts along with that of the surveyed professionals.
Read the full report at this link.