Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home News

AWS Seeks an End to ETL

admin by admin
December 6, 2022
in News


Extract, transform, and load. It’s a simple and ubiquitous thing in IT. And yet everybody seems to hate it. The latest company to pile on to ETL is AWS, which declared an effort to end ETL yesterday at re:Invent.

Adam Selipsky, the CEO of Amazon’s web services division, discussed the everlasting blight that is ETL during his re:Invent keynote Tuesday morning.

“Combining data from different data sources and different types of tools brings up a phrase that strikes dread into the hearts of even the sturdiest of data engineering teams. That’s right, I’m talking about ETL,” Selipsky said. “Just a few weeks ago, we got an email from a customer discussing ETL, and he literally used this phrase: ‘Thankless, unsustainable black hole.’”

Despite the pain and suffering that ETL has brought upon the computing world, few alternatives have appeared. Many companies have switched the “transformation” and “load” components by conducting the transformation stage in the cloud data lake, data warehouse, or data lakehouse, which has given rise to ELT. But it hasn’t changed the fundamental problem with ETL.

“The manual effort, complexity and undifferentiated heavy lifting involved in building ETL pipelines is painful,” Selipsky continued. “It requires writing … custom code. Then you have to deploy and manage the infrastructure to make sure the pipeline scales. Still, it can be days before the data is ready. And all the while, you’ve got eager analyst pinging you again and again to see if their data is available. And if, when something changes, you get to do it all over again.”

AWS customer quote on the sheer joy and hapiness of managing ETL data pipelines

Sound familiar? If ETL isn’t the bane of data engineering, it’s hard to say what is.

There have been various approaches to deal with ETL over the years. One of the most popular techniques is to just leave the data where it is and push the queries over the wire. AWS has enabled this federated query approach with its analytics and machine learning tools.

“We’ve integrated SageMaker with Redshift and Aurora to enable anyone with SQL skills to operate machine learning models to make predictions, also without having to move data around,” Selipsky said. “These integrations eliminate the need to move data around for some important use cases.

“But what if we could do more? What if we could eliminate ETL entirely?” the CEO said. “This is our vision, what we’re calling a zero-ETL future. And in this future, data integration is no longer a manual effort.”

To that end, AWS unveiled two new solutions that it claims helps eliminate the need for ETL with Redshift, the company’s flagship big data analytics database.

The first zero-ETL solution is a new integration between Aurora and Redshift. According to AWS, companies, once transactional data is available in Aurora, it is “continuously replicated seconds” to Redshift, where it is available within seconds. Aurora, of course, is AWS’s relational database offering that’s compatible with PostgreSQL and MySQL.

“This integration brings together transactional data with analytics capabilities, eliminating all the work building and managing custom data pipelines between Aurora and Redshift,” Selipsky said. “It’s incredibly easy. You just choose the Aurora tables, combining the intended data that you want in order to get that into Redshift. It appears in seconds. If it comes into Aurora, seconds later the data is seamlessly made available inside Redshift.”

The new feature also gives customers the ability to move data from multiple Aurora databases into a single Redshift instance, AWS said. The new data integration is serverless, the company said, and scales up and down automatically based on data volume.

The second new capability in AWS’s “zero-ETL” future involves a new integration between Redshift and Apache Spark, the popular big data processing framework that is used in Amazon EMR, Amazon Glue, and Amazon SageMaker.

Customers often want to analyze data in Redshift using these Spark-based services, but previously that required either manually moving the data, building an ETL pipeline, or obtaining and implementing certified data connectors that could facilitate that data movement.

With the new Redshift integration for Spark unveiled by AWS this week, there is no longer a need to obtain those third-party connectors. Instead, the integration is built into the products.

“Now it’s incredibly easy to run Apache Spark applications on Redshift data from AWS analytics services,” Selipsky said. “You can do a simple Spark job on Jupyter notebooks in AWS services like EMR, Glue, and SageMaker to connect to Redshift to run read/write queries against Redshift tables. No more need to move any data. No need to build or manage any connections.”

Both of these Redshift integrations, for Aurora and Spark, make it easier to generate insights without having to build ETL pipelines or manually move data, Selipsky said. “These are two more steps forward towards our zero-ETL vision,” he said. “We’re going to keep on innovating here and finding new ways to make it easier for you to access and analyze data across all of your data stores.”

Related Items:

AWS Unleashes the DataZone

AWS Introduces a Flurry of New EC2 Instances at re:Invent

Can We Stop Doing ETL Yet?

The post AWS Seeks an End to ETL appeared first on Datanami.



Source link

Previous Post

What Data do Gaming Companies Collect and/or Use?

Next Post

Enable federation to Amazon QuickSight with automatic provisioning of users between AWS IAM Identity Center and Microsoft Azure AD

Next Post

Enable federation to Amazon QuickSight with automatic provisioning of users between AWS IAM Identity Center and Microsoft Azure AD

Recommended

Report: Audit Industry Rising to the Data Analytics Challenge

December 6, 2022

Big Data Industry Predictions for 2023

December 14, 2022

30 Best AI Art Generators and Software in 2023

February 8, 2023

Don't miss it

News

Capgemini Connects the Dots on Public Data Ecosystems

February 8, 2023
News

30 Best AI Art Generators and Software in 2023

February 8, 2023
Big Data

How Banks are Using Technologies to Help Underserved Communities

February 7, 2023
Big Data

Fiddler Brings Industry-Leading Advanced Actionable Insights and Rich Diagnostics to Model Performance Management

February 7, 2023
Big Data

Databricks Expands Brickbuilder Solutions for Migrations in EMEA

February 7, 2023
News

Introducing the AWS ProServe Hadoop Migration Delivery Kit TCO tool

February 7, 2023

big-data-footer-white

© 2022 Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.