In this special guest feature, Christian Romming is CEO of Etleap, believes there’s too much at stake to allow an oversight to result in a security breach or downtime. Etleap is a managed data pipeline company serving enterprises like Moderna, PagerDuty, and Morningstar. Christian founded Etleap in 2013 after a career in data engineering. He is originally from Norway with computer science degrees from University of Warwick and Stanford University.
Congratulations! You’ve done your research, evaluated various cloud data warehouse (CDW) options, gotten the right buy-in and approvals across your organization, and are ready to move forward on Snowflake, Redshift, Delta Lake, or another CDW. New business intelligence (BI), analytics, and machine learning opportunities are now clearly in sight. CDWs have advantages in speed, efficiency, and cost versus their on-premises predecessors. But there is still hard work ahead, and the choices you make now can mean the difference between days or months or years for when you achieve that success.
While the world of extract, transform, and load (ETL) has evolved from its on-premises roots, building and managing the data pipelines that will deliver analytics-ready data to data consumers can still be very resource intensive. Below are five ways to cut those resource requirements and shorten your timeline to a successful CDW launch.
1. Find a connector for any data source
You may have a deep, strong team of data engineers that has written code for source integrations in the past. They may even like this work, though most are happy to leave this often tedious work behind. Regardless, this is one of the biggest opportunities to accelerate your data warehouse migration. Many teams before you have needed connectors for databases, files, apps, or event streams. There are various pre-built connector tools available that cover the majority of most organizations’ data sources. And sure, you also probably have some data sources that are unique to your industry or even your company. But just as with pre-built connectors, you’ll benefit from the experience of a vendor with frameworks and experience specific to handling custom sources.
2. Automate the infrastructure
By moving to the cloud, you’ve left the world of managing physical servers in a data center, but you can still overwhelm your team with infrastructure work if you’re not careful. Managing the recurring movement and preparation of data requires scheduling tasks and their dependencies, provisioning compute clusters, optimizing for cost and performance, and more. There are different options to relieve your team of this engineering time, from open source orchestrators and serverless options to fully managed pipeline tools.
3. Democratize data production
It’s common to think of data democratization mostly as an outcome of a successful CDW project. Providing dashboards and data sets to more data consumers is certainly key to a data literate organization. It’s also important to enable the producers of the data, those most familiar with its meaning and history. Absent this, a central team is left responsible to select data and deliver it with meaning and value to data consumers. They’ll either spend countless hours researching each domain and data source or end up generating a CDW that users can’t understand and don’t trust. A better approach is to give the domain experts no-code tools to directly build pipelines and prepare data for analytics.
4. Don’t ignore troubleshooting time
As you plan a migration to a CDW, it’s easy to focus all your attention on the data engineering effort required to launch the CDW. However, your data engineers can frequently spend as much time troubleshooting as anything else. There are tools for monitoring, and you can write code for error alerting. Even more effective are fully managed pipeline offerings that provide these features out of the box and can resolve issues before they reach your team. All five of these tips will increase your CDW’s uptime, which is the ultimate time relief for your data engineers. And it’s also key to reaching value, which is dependent on your data consumers’ trust and adoption.
5. Expect the unexpected
At this point you may be thinking that automation has everything solved and the ecosystem of tools has covered every conceivable case. The reality is there is not a single easy button, and you should be wary of black box solutions that suggest pipelines can be 100% automated. Data sources and destinations will change. You may decide to integrate capabilities like a business catalog or a data quality workflow. Be sure you’ve invested in tools or services with the flexibility to handle your unique and changing environment. You may save hundreds of hours with rigid automation, but give those time savings back when they require workarounds for your edge cases.
Some organizations will take years to reach value from their on-prem to CDW migration. The goal of this article isn’t to cause despair but to ensure you move forward with open eyes. It can be helpful to assess where your organization should build vs. buy vs. partner. Many think they need to build in-house to maintain flexibility and control, and they’re willing to spend far more time to do so. But there are options available that make this a false tradeoff. Here’s one example where a proper analysis and the right tools cut two years off a large financial services firm’s deployment.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW