Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home Big Data

Gaining Control of Your CDP Environment

admin by admin
April 24, 2023
in Big Data


Posted in Technical |
April 24, 2023 3 min read

Unwelcome…

… are platform instability, downtime, hardware failure, poor performance, cluster resource contention, repeated process failures, runaway live queries, critical services alarms, invisibility into alarm cacophony…  the list goes on. If those are ailments you would like to remedy …

Welcome!

To this six-part series, where we’ll look at how to get control of the health of your Cloudera Data platform (CDP) environment. Out of the box, CDP performs superbly, but over time, if data architecture, data engineering, and DevOps best practices are not maintained, the Data City you’ve erected atop a solid CDP bedrock can become the wild, wild, west. Perhaps it’s time for some law and order to prevent further crimes against the tech.

More than a case study, we’ve interwoven best practices gleaned from multiple configurations and client sites into a comprehensive, easy to understand set of instructions to diagnose and resolve many of the issues that adversely impact CDP environmental health. 

With each blog we’ll outline the symptoms and root causes of common environmental health challenges and prescribe solutions. Where we can, we’ll include valuable links to step-by-step instructions to guide you through successful implementation. When we conclude the series, we’ll share a homegrown tool, an environmental health scorecard, to monitor and manage the health of your environment.

There are many, many reasons that an environment may perform poorly, and certainly some resolutions take time and effort, but there is quite succulent low hanging fruit. Our great hope is that you find impactful quick wins that inspire you to pursue multiple avenues of health improvement. You may also decide to partner with our Cloudera Professional Services team who more than doubled a customer’s health score in two short quarters.

Categories of CDP Environmental Health

We’ve categorized aspects of environmental health for this series.

Visibility and Transparency

Into the cluster, platform, services, and processes. We won’t be able to make much progress if we do not have proper visibility into the problems. That’s observability. In this blog we provide instructions and tools on how to gain visibility, suppress alarm noise, find and analyze the root causes of the most significant opportunities, and proactively notify your users when incidents occur

Data Asset Standardization

Of common datasets, pipelines, processes, and reports. Admittedly, data asset standardization is a multiyear journey; notwithstanding, addressing only your most problematic and resource-intensive processes and assets may yield more environmental health improvement than any other category. We’ll share best practices on how to locate and capitalize on those opportunities.

Platform Health

Includes hardware and services settings and configurations. Cloudera Data Platform (CDP) must be configured properly to function well with high performance. Furthermore, as business needs continually change, so will your use of the platform, and that will necessitate re-tuning. To help you on that journey, we’ll list some common symptoms, link them to root cause analysis steps, provide proper configuration guidelines, and outline the steps to properly tune your environment.

The Right Tool for the Job

Includes the proper use of Impala, CDSW, Airflow, Nifi, and CM. You might be surprised at the adverse environmental impact of using CDSW as an ETL pipeline tool or using Impala to write unwieldy queries with an embarrassing number of joins. We’ve done it too. We confess. We’ll highlight the advantages of using Airflow to manage complex data pipelines with its facility to divide workflow into small independent tasks. We’ll list other do’s and don’ts. 

Environmental Health Scoring

Brings it all together by demonstrating how to measure, score, monitor, and control environmental health through dashboards that we provide for you along with instructions to hook them up to your logs.

If you’ve got the symptoms, the doctors are in. Let the healing begin!



Source link

Previous Post

Power to the Data Report: Introduction to Neural Magic

Next Post

12 Open Source Alternatives to ChatGPT and Bard: Empowering the Community with AI Creativity

Next Post

12 Open Source Alternatives to ChatGPT and Bard: Empowering the Community with AI Creativity

Recommended

#ClouderaLife Volunteer Spotlight: Alex Campos, Principal Technical Leader, Spain

May 12, 2023

The Lucrative Landscape of Digital Publishing: A Roadmap to Wealth for Online Magazines

August 29, 2023

Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework

June 4, 2023

Don't miss it

News

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

December 4, 2023
News

Amazon Launches AI Assistant, Amazon Q

December 4, 2023
News

Beyond Aesthetics: The Psychology of Colours in Graphic Design

December 4, 2023
Big Data

Riding the OpenAI Rollercoaster – Cloudera Blog

December 3, 2023
Big Data

Three Roadblocks to Using Data to Its Full Potential

December 3, 2023
Big Data

Presenting New Partner Integrations in Partner Connect

December 3, 2023
big-data-footer-white

© Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.