Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home News

IBM Embraces Iceberg, Presto in New Watsonx Data Lakehouse

admin by admin
May 16, 2023
in News


(Francesco Scatena/Shutterstock)

IBM yesterday unveiled watsonx.data, a new data lakehouse offering for cloud and on-prem that will use object storage and Apache Iceberg, an open data format. Big Blue launched two other offerings in the new watsonx family yesterday at its annual THINK conference, including watsonx.AI and watsonx.governance. Together, the three watsonx components represents IBM’s latest push into the enterprise AI market.

Lakehouses have proliferated in recent years as companies look to combine the massive scalability of cloud-based object storage while borrowing the proven data management and governance capabilities of traditional data warehouses running on analytics databases. Instead of ungovernable data swamps, the lakehouse is designed to bring order to data, but without the storage limitations posed by data warehouses.

When it becomes generally available in July, IBM’s new Watsonx.data lakehouse will run on-prem and in the IBM Cloud and AWS. While IBM didn’t specify in its announcement, the offering is assumed to utilize IBM’s own flavor of object storage, which it obtained with its 2015 acquisition of Cleversafe for $1.5 billion.

Watsonx.data will also incorporate Apache Iceberg, the increasingly popular open table format that emerged from Netflix and Apple to address data consistency and correctness issues that arose with the reliance on Apache Hive in the early days of Hadoop-based data lakes. By bringing support for ACID transactions to data, Iceberg enables customers to bring multiple compute engines to bear on data residing in a lake or lakehouse.

To that end, IBM foresees Presto and Apache Spark being two of the first data engines to run in its watsonx.data lakehouse. IBM has been a big supporter of Spark for years, both in terms of running it on behalf of customers and making upstream code changes to the project.

But IBM also has a sizable investment in Presto, the distributed query engine from that came out of Facebook last decade as the replacement for Apache Hive (which it also created). With its capability to read data from multiple data stores and serve up fast ad-hoc queries, Presto has emerged as one of the leading processing engines for the modern data stack.

IBM moved into the Presto business last month with its acquisition of Ahana, a Silicon Valley startup that’s building a Presto-based business in the cloud. Ahana had raised $32 million and was building its cloud-based Presto business, which competes with Trino-backer Starburst (Trino is a fork of Presto) and Amazon Athena, the serverless AWS analytics service that uses Presto and Trino).

IBM says that, in the future, watsonx.data will incorporate its Storage Fusion technology “to enhance data caching across remote sources as well as semantic automation capabilities built on IBM Research’s foundation models to automate data discovery, exploration, and enrichment through conversational user experiences.”

Watsonx.data will feature built-in governance capabilities for data house in the lake. The company also launched watsonx.governance to help provide guardrails and transparency for AI and machine learning models developed in watsonx.ai, which is another new offering unveiled by IBM. Specifically, IBM says watsonx.governance will “provide the mechanisms to protect customer privacy, proactively detect model bias and drift, and help organizations meet their ethics standards.”

Watsonx.ai, meanwhile, will function as a new development studio for building AI applications. The offering will include a library of “foundation models” upon which customers can build AI applications. In addition to language models, IBM will include models designed to work with code, time-series data, tabular data, geospatial data, and IT events data, IBM says.

Among the models that will be included in watsonx.ai are: fm.code, which automatically generate code for developers through a natural-language interface; fm.NLP, a collection of large language models (LLMs) for specific and industry-specific domains; and fm.geospatial, a model built on climate and remote sensing data to help organizations understand and plan for changes in natural disaster patterns, biodiversity, land use, and other geophysical processes, IBM says. IBM will also incorporate into watsonx.ai thousands of natural language processing (NLP) models developed by Hugging Face.

The new watsonx line of offerings will give customers the tools they need for building next-gen AI models while retaining governance and control, says Arvind Krishna, IBM chairman and CEO.

“With the development of foundation models, AI for business is more powerful than ever,” Krishna said in a press release. “Foundation models make deploying AI significantly more scalable, affordable, and efficient. We built IBM watsonx for the needs of enterprises, so that clients can be more than just users, they can become AI advantaged. With IBM watsonx, clients can quickly train and deploy custom AI capabilities across their entire business, all while retaining full control of their data.”

Related Items:

IBM Joins the Presto Foundation with Acquisition of Ahana

Open Table Formats Square Off in Lakehouse Data Smackdown

Snowflake, AWS Warm Up to Apache Iceberg

Editor’s note: This article has been corrected. The headline was changed to reflect IBM’s focus on Presto, not Trino. Datanami regrets the error.



Source link

Previous Post

Leveraging Big Data for Sustainable Printing Practices

Next Post

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

Next Post

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

Recommended

From Small To Big: Tips On Growing Your Business Successfully

June 2, 2023

Extract data from SAP ERP using AWS Glue and the SAP SDK

February 10, 2023

State of Java Ecosystem Report

May 8, 2023

Don't miss it

News

Why Roblox Picked VictoriaMetrics for Observability Data Overhaul

June 6, 2023
News

Fivetran vs Matillion: Unveiling the Ultimate Battle of ETL Tools

June 5, 2023
Big Data

3 Key AI Predictions for The Near Future + How to Use AI to Transform Your Business

June 5, 2023
Big Data

Better LLMs with Better Data using Cleanlab Studio

June 5, 2023
News

Trakstar unlocks new analytical opportunities for its HR customers with Amazon QuickSight

June 5, 2023
News

Saving Sea Turtles with SAS’s ConserVision App

June 5, 2023
big-data-footer-white

© Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.