Data governance used to be a side hustle that Data Engineers would tackle as they were doing their “real jobs”–building their pipelines or warehouse size corrections or indexes, their views, raw zone, presentation layer, or data contracts. In between, they’d mask some data or throw in a row-level policy. But as data regulations have become more strict, numerous, and prominent, data governance has become a real job of its own, with data stewards or compliance teams focused on determining policies.
In addition, data users have proliferated across the enterprise. Now every “line of business” user must have access to data to improve results. This has led to a situation where data is moving from one end of the company to the other, but the rules around it are stuck in silos, with each team moving, touching or using data unaware of how what they’re doing fits into the whole.
Imagine a data engineer in the middle of this data flow, in charge of a warehouse where trucks keep showing up and dropping off data pallets. Where did the data come from? Who sent it? What kind of data is it? What are the requirements for storing and sharing it? Brick-and-mortar warehouses have this down to a science through their supply chains. Enterprises need to make sure the same rigor around their data supply chain.
We’re working with clients and customers to build their integrated data ecosystems in real time and running into these conflicts. For example, data engineers still think that Snowflake is owned and operated solely by them because it’s their baby. They did all of it themselves, so shouldn’t they own all the policies and rules around the data stored there? What about ETL teams who bring in new temporary data tables with corresponding metadata – what data are they bringing in, and how should it be protected?
Here are three tips for avoiding these data supply chain pitfalls:
1. Make Your Data Governance Policies Visible
Snowflake developers can easily write a masking policy in a few minutes–writing code is what they do! But while this is a no-brainer for the here-and-now and can even work long-term when teams are small, once you’re enterprise-size and dealing with data moving from one team to another, single, one-off policies become a dead-end. Basically, you’ve applied a policy locally that only technical Snowflake developers can see.
To create consistency, you need to deliver visibility into all the existing masking policies and what the masking policies are doing to anyone and everyone (technical or not) in your data supply chains. Data governance teams need to know what policies are in place, on what data, and where. End users need to know what data they can access or not. Masking policies aren’t just for Snowflake DBAs anymore.
2. Get Out of Your Silo
Today, we see a lot of “right hand” not knowing what the “left hand” is doing across the data supply chain. Line of business (LoB) users who use the data are so far removed from the data stewards tasked with protecting data it’s like they’re in different worlds. LoB users are busy figuring out how to shave costs; they’re not thinking about HIPAA or PCI regulations. So it’s critical for data middlemen to step out of their silos to understand how all business functions interact with data.
One of the first steps is to find out who is sending those truckloads of data–who’s the “ETL team” at your company? Why are they sending the data in the format it’s in? What are the rules around it? The next step is to talk to your data governance or data stewards. We’ve seen companies purchase and implement a data catalog and fill it full of data governance policies…that never gets applied to the actual data. It’s a data governance policy in a vacuum.
For example, the policy might say that no one outside HR should have access to payroll data. But that data is in the cloud data warehouse with no controls on it. How is that policy enforced? That leads to the next step: find out who’s using the data and why. Is it marketing, finance, or operations? Do they all need the same access to the same data? Are they all accessing it the same way – from Snowflake directly or through reports and dashboards in a BI (Business Intelligence) tool like Tableau? Can you enforce policy through to those end users?
3. Look for Tools That Make Integrations Easy
Building a modern data supply chain may not be what you had in mind when you woke up this morning. “Hey, I’m just the data guy!” But if your company is buying a data catalog here and an ETL tool there and just crossing its fingers hoping they’ll all work together, that will quickly lead to headaches for you and your colleagues.
In the modern data stack, you want your data stewards to be able to set policy in the data catalog that is then automatically enforced in the CDW (Cloud Data Warehouse). You want your ETL tool to tokenize data from the database to the cloud without ever allowing access to unapproved users. And you want to make sure not just your masking policies but all of your access controls and governance policies scale to wherever data is consumed. It might be tempting to build your own solution, but like the one-off data policies mentioned above, BIY (build-it-yourself) doesn’t scale. It won’t integrate easily with other data ecosystem tools.
Look for a tool that delivers free, open-source data ecosystem integrations that tightly and flexibly connect your data supply chain tools for end-to-end data governance controls.
The data warehouse is just one point in your company’s end-to-end data chain, and the decisions you and the rest of the organization make affect how data is treated up and down the supply chain. So ensure that you’re building a seamlessly integrated data supply chain that leads to a true data value chain for end users and your enterprise.
About the author: Part of the founding team at ALTR, Chris Struttmann was the original Chief Engineer and Architect of the ALTR platform and now leads the company’s strategic technology vision. Chris brings over 15 years of innovation experience in the enterprise and cloud computing markets, having held engineering roles at Dash Financial Technologies, Tastemaker Labs, Groome Technologies, and others. Chris attended the Florida Institute of Technology College of Engineering and is named ‘inventor’ on over 15 patents relating to ALTR’s product portfolio.
What Is An Analytics Engineer and When Do You Need One?
Why DataOps-Centered Engineering is the Future of Data
Battle for Data Pros Heats Up as Burnout Builds