Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home Big Data

Gaining Control of Your CDP Environment

admin by admin
April 24, 2023
in Big Data


Posted in Technical |
April 24, 2023 3 min read

Unwelcome…

… are platform instability, downtime, hardware failure, poor performance, cluster resource contention, repeated process failures, runaway live queries, critical services alarms, invisibility into alarm cacophony…  the list goes on. If those are ailments you would like to remedy …

Welcome!

To this six-part series, where we’ll look at how to get control of the health of your Cloudera Data platform (CDP) environment. Out of the box, CDP performs superbly, but over time, if data architecture, data engineering, and DevOps best practices are not maintained, the Data City you’ve erected atop a solid CDP bedrock can become the wild, wild, west. Perhaps it’s time for some law and order to prevent further crimes against the tech.

More than a case study, we’ve interwoven best practices gleaned from multiple configurations and client sites into a comprehensive, easy to understand set of instructions to diagnose and resolve many of the issues that adversely impact CDP environmental health. 

With each blog we’ll outline the symptoms and root causes of common environmental health challenges and prescribe solutions. Where we can, we’ll include valuable links to step-by-step instructions to guide you through successful implementation. When we conclude the series, we’ll share a homegrown tool, an environmental health scorecard, to monitor and manage the health of your environment.

There are many, many reasons that an environment may perform poorly, and certainly some resolutions take time and effort, but there is quite succulent low hanging fruit. Our great hope is that you find impactful quick wins that inspire you to pursue multiple avenues of health improvement. You may also decide to partner with our Cloudera Professional Services team who more than doubled a customer’s health score in two short quarters.

Categories of CDP Environmental Health

We’ve categorized aspects of environmental health for this series.

Visibility and Transparency

Into the cluster, platform, services, and processes. We won’t be able to make much progress if we do not have proper visibility into the problems. That’s observability. In this blog we provide instructions and tools on how to gain visibility, suppress alarm noise, find and analyze the root causes of the most significant opportunities, and proactively notify your users when incidents occur

Data Asset Standardization

Of common datasets, pipelines, processes, and reports. Admittedly, data asset standardization is a multiyear journey; notwithstanding, addressing only your most problematic and resource-intensive processes and assets may yield more environmental health improvement than any other category. We’ll share best practices on how to locate and capitalize on those opportunities.

Platform Health

Includes hardware and services settings and configurations. Cloudera Data Platform (CDP) must be configured properly to function well with high performance. Furthermore, as business needs continually change, so will your use of the platform, and that will necessitate re-tuning. To help you on that journey, we’ll list some common symptoms, link them to root cause analysis steps, provide proper configuration guidelines, and outline the steps to properly tune your environment.

The Right Tool for the Job

Includes the proper use of Impala, CDSW, Airflow, Nifi, and CM. You might be surprised at the adverse environmental impact of using CDSW as an ETL pipeline tool or using Impala to write unwieldy queries with an embarrassing number of joins. We’ve done it too. We confess. We’ll highlight the advantages of using Airflow to manage complex data pipelines with its facility to divide workflow into small independent tasks. We’ll list other do’s and don’ts. 

Environmental Health Scoring

Brings it all together by demonstrating how to measure, score, monitor, and control environmental health through dashboards that we provide for you along with instructions to hook them up to your logs.

If you’ve got the symptoms, the doctors are in. Let the healing begin!



Source link

Previous Post

Power to the Data Report: Introduction to Neural Magic

Next Post

12 Open Source Alternatives to ChatGPT and Bard: Empowering the Community with AI Creativity

Next Post

12 Open Source Alternatives to ChatGPT and Bard: Empowering the Community with AI Creativity

Recommended

Accelerate Your Career With Data Engineer Learning Pathway Improvements

December 2, 2022

Automate data lineage on Amazon MWAA with OpenLineage

January 20, 2023

Exploring Digital Transformation in 2023

January 7, 2023

Don't miss it

News

How to Make a Yummy Food Infographic

June 3, 2023
Big Data

Fake ChatGPT Apps Scam Users Out of Thousands of Dollars, Sophos Reports

June 3, 2023
Big Data

The Executive’s Guide to Data, Analytics and AI Transformation, Part 5: Make informed build vs. buy decisions

June 3, 2023
News

BWH Hotels scales enterprise business intelligence adoption while reducing costs with Amazon QuickSight

June 3, 2023
News

Snowflake Bolsters Data Cloud Search Capabilities with Neeva Acquisition

June 3, 2023
News

From Small To Big: Tips On Growing Your Business Successfully

June 2, 2023
big-data-footer-white

© Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.