Root Cause Changes (RCC)

BigPanda’s Root Cause Changes (RCC) feature highlights changes related to incidents, helping you find and fix change-related incidents faster.

Identifying the root cause of an outage or a poorly performing application is one of the biggest challenges that IT organizations face today.

Most enterprises experience thousands of incidents every week. IT Ops, NOC and DevOps teams must be able to quickly understand how each incident is impacting the business and prioritize response accordingly, before users and customers are affected. However, because operational and business context is often missing from monitoring data, operators must manually search for context before taking action, a process that wastes precious time and is prone to human error.

  • Modern IT environments can experience thousands of changes every week
  • “Over 85% of outages impacting mission-critical services can be traced back to changes” - Gartner
  • Code or configuration changes can account for over 50% of an organization’s incidents
  • The relationship between a change and its effect is often indirect, even domain experts might only guess the right cause
  • Manually investigating changes related to an incident is often the longest step in detecting the root cause of an incident
    To help combat this struggle, BigPanda provides a single pane of glass for Ops teams to view, manage, and triage incidents, complete with in-line root cause change suggestions.

BigPanda’s Root Cause Change capability uses Open Box Machine Learning to mark likely potential suspects in the UI, helping teams identify changes in infrastructure and applications. By pinpointing the root cause of incidents and outages in real-time, BigPanda helps enterprises and their IT Ops, NOC, and DevOps/SRE teams rapidly investigate and resolve those incidents and outages.

The Changes TabThe Changes Tab

The Changes Tab

Key Features

  • Integration - Funnel all your change integrations into BigPanda's Open Integrations Hub to see all your changes organized and correlated in one place.
  • Visualization - See a consolidated list of all the system changes related to each incident.
  • Correlation - Use BigPanda's OBML or manually correlate changes to incidents to enable Root Cause Analysis.
  • Collaboration - Collaborate with other users to investigate which change is the Root Cause of the incident.

BigPanda’s Root Cause Changes feature streamlines the cause investigation process for your team, dramatically reducing the troubleshooting phase of incident resolution. By giving your team instant visibility and easy collaboration, BigPanda’s RCC dramatically reduces MTTR for change related incidents.

How It Works

BigPanda integrates with your change feeds to collect change data such as CI/CD pipelines, Change Management tools, auditing systems, and orchestration tools. Change data such as managed changes, code deployments, software updates, configuration changes, upgrades, and more is stored and organized into the Related Changes table within the Incidents tab. Changes and incidents are updated systematically so that the changes to each incident remain current.

Change data is normalized with searchable, correlatable tags. By bringing change data together from across the different layers of your environment, BigPanda helps your Ops teams get visibility on the system as a whole.

Once integrated with all your change feeds/tools, BigPanda's OBML (Open Box Machine Learning) algorithms detect connections between changes made to the system and incidents in real-time, identifying changes that may have caused the outage.
Changes that are correlated strongly enough to imply causation are floated up onto the Incident Overview as suggested related changes, with a comment from BigPanda explaining why the change was suggested.

Team members can review change data and investigate root cause right in BigPanda, marking changes as matches and collaborating with their team using BigPanda’s deep integrations and sharing capabilities.

By automatically suggesting changes as being suspect of incidents, without the need for operators to manually sift through the changes to guess root cause changes, RCC helps enterprises and their IT Ops, NOC, and DevOps/SRE teams rapidly investigate and resolve those incidents and outages, and speeds up MTTR.

BigPanda Machine Learning

At its core, BigPanda’s Root Cause Analysis relies on pattern recognition.

The Root Cause Analysis algorithm runs calculations on key connections between incidents and changes, including

  • Categories: The machine learning engine sorts alerts and changes into matching categories based on specific keys and values
  • Time Factor: Each change is evaluated on whether it occurred before or during an incident start, and how closely the timelines match
  • Alerts Coverage - All of the alerts in an incident are weighed to see how many of the alerts match the change

In complex modern systems, the root cause may have been tied to unexpected systems or architecture, so BigPanda uses a dual-pronged algorithm to help you spot even the most unusual root cause changes.

Examples of Suggested Related Changes

Text-Based Suggestion

Deep enrichment allows BigPanda to pull dozens of alert tags and metadata together into one incident. For a human operator, finding a single matching value between changes and incidents is a time-consuming and tedious process.

This is where the text-based algorithm comes in.

Sample Text-Based SuggestionSample Text-Based Suggestion

Sample Text-Based Suggestion

In this example, the incident was enriched with Configuration Item (CI) data from the CMDB. This CI value was also found in the change information - meaning this change was affecting the exact configuration item that was now encountering trouble.

As changes occurring on the same item or system are likely linked, the algorithm highlighted this as a suspected change.

To learn more about how to use BigPanda’s Root Cause Changes feature, see the Correlating Changes With Incidents documentation.

Recommended Reading

To learn more about working with the Related Changes section and to see relevant integrations, see: