Root Cause Changes

Identify and review system changes that could be at the root of incidents.

📘

Advanced Insights Module

This feature is part of the Advanced Insights Module. If your organization has not purchased this module, you may not have access to the feature.

If you are interested in upgrading to the Advanced Insights Module, contact your BigPanda account team.

Root Cause Changes (RCC) dramatically speeds up the process of identifying the changes that cause outages and incidents in your environments.

When you integrate your CI/CD and change management tools with BigPanda, you can normalize and aggregate change data alongside incidents. This comprehensive enrichment gives you deep insights into the changes that may have triggered an issue.

BigPanda analyzes each change against active existing incidents in real-time, so your teams don’t have to manually dig through hundreds or thousands of potentially related change events. Changes that are suspected as a potential root cause are flagged and added to the incident details.

Change Suspects in the Change Table

Change Suspects in the Change Table

RCC uses an algorithm based on natural language processing and vector space models to compare changes to active incidents. This allows BigPanda to identify the connections between alerts and change data with confidence that suspected changes are statistically relevant.

Key Features

  • Connect a variety of change tools to BigPanda using standard and custom integrations.
  • View changes that occurred prior to and during an incident to easily identify changes that may have been related to the incident.
  • Visualize metrics related to change data using dashboards in Unified Analytics.

🚧

Changes (RCC) API

Root Cause Changes can also be viewed and managed with the Changes (RCC) API.

Integrate Changes with BigPanda

BigPanda includes several standard integrations ready to connect your change feeds to BigPanda. You can also build custom integrations with the Root Cause Changes (RCC) REST API. These integrations collect and normalize data from your various tools and bring them together into BigPanda’s single pane of glass.

Change integrations give your Operations teams deeper insights into the system changes that may be triggering system events and outages. This gives Operations teams clear visibility into changes pushed by Developers, empowering the two teams to collaborate more proactively and effectively.

Learn more about integrating your change tools with BigPanda in the Integrate with BigPanda documentation.

Changes in the BigPanda Incident Console

Changes that occurred shortly before or during the incident are displayed in the Incident Details pane within the Changes tab. Here operators can see vital information about the change, including status, summary, and start time.

Change Details in BigPanda

Change Details in BigPanda

Meanwhile, BigPanda will automatically compare change data to incoming incidents, looking for potential incident causes. If a change is highly correlated with an incident, it will appear as a Suspect in the Incident Details pane. RCC suspects are identified with details about why BigPanda thinks the change may be related to the ongoing incident. While BigPanda is configured to suggest up to 5 related changes, only changes that are highly correlated will be suggested.

Operators can then search the table, dig into change details, and mark whether a change should be matched to the incident. By marking the results of change investigation in the console, teams can collaborate together to identify the real cause.

📘

Use the Show potential RCC only toggle to limit the change table to only show changes BigPanda has identified as RCC suspects.

Learn more about how operators can leverage changes in the BigPanda console in the Remediate Incidents documentation.

Automatic Root Cause Changes Suspects

Root Cause Changes (RCC) leverages an algorithm based on natural language processing and vector space models. BigPanda intuitively compares the complex and discordant data from monitoring and change tools, while considering the context and timing of causal relationships.

Explanation for a Suspected Change

Explanation for a Suspected Change

RCC runs calculations on key connections between incidents and changes, including:

  • Time Frame - how close were the change and incident
  • Alerts Coverage - how many of the alerts match properties in the change
  • Categories - groups of specific details defined for weighting and parsing matches

Each incident-change match is given a causation score based on these calculations, with a higher score indicating a more likely suspect. Changes with a high causation score are surfaced in the Incident Details pane as RCC Suspects.

📘

Score Calculation

Change suspect causation scores do not have an upper limit. Each suspect match point gets added to the score. Higher scores indicate a stronger match and can assist in determining the most likely suspect.

Scores only appear for matches that have met the threshold in your RCC configuration. To adjust your RCC configuration, contact BigPanda Support.

To set a baseline of your scores and monitor trends, see the ​Suspected Changes Analysis​​ dashboard in Unified Analytics.

Time Frame

RCC is focused on finding causation, not correlation. Only changes that have been implemented long enough to create a system event and scheduled change windows which have recently started are considered as potential causes.

In addition, changes that happened too far before the incident are also excluded, as incidents usually happen shortly after system changes.

Alerts Coverage

Many incidents will have at least one alert that matches the data for recent changes. When only a single alert matches, the relationship between the incident and change may not be causal, especially in complex incidents with multiple downstream impacts.

To help identify strong causal relationships, RCC considers the percentage of alerts in an incident that align with the change details. Higher percentages indicate a closer connection between the incident and change.

Categories

RCC uses change details, alert tags, and incident metadata to find common values between incidents and changes.

However, not all matches imply strong connection or potential cause.

To consider the context and relationships between data, RCC breaks incident data into a hierarchy of categories weighted by importance, based on expected incident and change alignment. Different weights and parsing rules apply to tag matches in each category, making sure that matches reflect the relationship of shared system attributes and resources.

Your default RCC category configuration is built on common industry practices, system topology, and tags and processes unique to your organization.

Improving Root Cause Changes Results

RCC works best when it has rich data and meaningful relationships identified for your organization.

The more standardized information available in incoming tags and description fields, the easier it is for BigPanda to accurately spot causality. If you’d like to improve your RCC results, high quality enrichment and tag normalization is an important start.

For even more refinement of results, you can request modifications to your RCC category and parsing configuration. This is a complex back-end process requiring close coordination with BigPanda support. Reach out to us at [email protected] if you are interested in adjusting your RCC configuration.

Next Steps

Learn more about BigPanda's Incident Intelligence

Dig deeper into Correlating Changes with Incidents

Begin integrating Change Integrations