Managing Root Cause Changes

Root Cause Changes features like change integrations and the root cause analysis algorithm can be customized to fit the needs of your organization.

BigPanda’s Root Cause Changes feature collects change information through inbound change integrations and correlates changes to potentially related incidents.

Changes that are correlated strongly enough are floated up onto the Incident Overview section of the incident details as suggested related changes.

You can add change integrations and adjust what change data is sent to BigPanda to make sure your team has the information they need in the incident details pane.

Additionally, the text-similarity algorithm can be adjusted to better match the patterns within your system.

Relevant Permissions

Roles with the following permissions can manage the Related Changes section of the incident details:

Changes_Full_Access

Mark changes as Suspect/Match and edit changes marked by other users

Roles with the following permissions can manage the change integrations for your organization:

Integrations_Read

Read-only - view BigPanda Integrations in the BigPanda Integrations tab

Integrations_Full_Access

Full access - view, install, uninstall and/or work with integrations in the BigPanda Integrations tab

To learn more about how BigPanda's permissions work, see the RBAC - Role Based Access Control documentation.

Adding Change Integrations

BigPanda’s OOTB integrations collect and normalize change data from change feeds such as CI/CD pipelines, Change Management tools, auditing systems, and orchestration tools.

BigPanda includes several Out of the Box(OOTB) integrations ready to connect your change feeds, or create custom integrations using the Changes REST API.

Connecting each of your change feeds to BigPanda gives your Ops teams deeper insights into the system changes that may be triggering system events and outages. Change integrations in particular dramatically speed up MTTR and dramatically reducing the amount of bridge calls by proactively encouraging collaboration between Ops and Dev teams.

When configuring change integrations, make sure to include as much relevant information into the payload as possible. In addition to giving your teams vital information, change data is key to quality related change suggestions. The more details available, the better BigPanda’s RCC algorithms can work.

  • Ensure full change description and details are available through either the description field or custom tags
  • RCC uses free text when looking for matches, the data doesn’t need to be in any particular format as long as it is included in the payload for BigPanda.
  • Each integration is configured to pull specific information from the change management tool into BigPanda, if useful details are not currently included, consider using a custom integration or Changes REST API

🚧

Change Rate Limitations

To maintain optimal system health, the BigPanda rate limitation for Root Cause Changes is capped at max. 400 changes per minute and 50K changes per week.

Integrating a Change Management Tool

To integrate a change management tool with BigPanda:

  1. In the Integrations tab of the BigPanda interface, click New Integration.
  2. In the Create a New Integration selection window, select the desired integration. Use the Changes tab to narrow the list to only change management integrations, or search for an integration using the search bar on the right. If you do not find the integration you are seeking, you can select the Changes Rest API integration to create a custom solution, or reach out to us at [email protected] for assistance.
  3. Follow the in-product instructions to integrate your change management tool with BigPanda.

For more information on available OOTB solutions and integrating change feeds with BigPanda, see the Changes integration documentation

👍

Once configured, we recommend sending a test change to ensure that the change data entering BigPanda fits your team’s needs

Sending Custom Tags

Each integration includes optional fields that can be configured in the integration’s direction. When configured, each of these optional fields will appear as a new column in the change table.

For many integrations, the start timestamp is optional, which means that changes can be sent to BigPanda without a start time marked. However, these changes will not appear in the UI. Only changes sent to Bigpanda with a start timestamp will appear in the UI. Most integrations will insert the current timestamp for changes missing a timestamp, but this may vary depending on integration configuration.

🚧

The status field is taken from the payload of a change and is NOT updated according to the start/end timestamps

🚧

Column Rate Limitation

To optimize change table performance, max. 30 columns are displayed in the table at a time. Click on any one of the changes in the table to see a pop-up with the full list of tags associated with the change.

Managing the Root Cause Analysis Algorithms

BigPanda’s Root Cause Changes feature collects change information through inbound change integrations and correlates changes to potentially related incidents.

Changes that are correlated strongly enough are floated up onto the Incident Overview section of the incident details as suggested related changes.

BigPanda’s Root Cause Changes feature leverages two separate algorithms to identify changes potentially related to incidents: a text similarity algorithm and a causal machine learning (ML) algorithm.

👍

The causal algorithm is not enabled by default. Contact us at [email protected] to find out pricing and RCC options for your organization

Both algorithms run calculations on key connections between incidents and changes, including:

  • Categories - tag key/value matches
  • Time Factor - how close were the change and incident
  • Alerts Coverage - how many of the alerts match properties in the change

BigPanda is configured to suggest up to 2 related changes per algorithm, but only changes that are highly correlated will be suggested. An incident may have 0-4 recommendations at any time.

Administrators can manage the two algorithms to improve the quality and frequency of suggested root cause changes.

👍

Both Root Cause Analysis Algorithms are focused on finding causation, not correlation. The causation time factor is configured to consider only changes that could have affected a system long enough to create a system event.

Text-Similarity Algorithm

The Text-similarity algorithm runs a calculation between incidents and changes to identify changes that may be related. This algorithm uses alert tags, details, and incident metadata to find common values between incidents and changes, using automation to replace the long man hours normally required for this process.

Once matches are found, the algorithm weighs the match to see if it is potentially a sign of root cause. To do so, the algorithm breaks incident data out into categories, or types of connection. Each category has a particular weight assigned to it, reflecting the type of relationship between changes and incidents, as not all text-matches point to shared system attributes or resources.

For example, the “IP” category of tags, with data on IP addresses is generally assigned a high weight as changes and incidents occurring at the same IP address are likely to be connected.

The algorithm then calculates a causation score based on the time frames of the incident and change, and the strength and number of matches between the two. Changes with a high causation score are surfaced as suggested related changes in the Incident Details pane.

Sample Text-Similarity SuggestionSample Text-Similarity Suggestion

Sample Text-Similarity Suggestion

Advantages of the text-similarity algorithm:

  • Day 1 Results
  • Configurable
  • Easy to see and understand reasoning
    Disadvantages of the text-similarity algorithm:
  • Only as good as the incoming data
  • Needs upkeep to adjust to changing systems
  • No additional insights

Configuring the Text-Similarity Algorithm

🚧

Adjusting the text-similarity algorithm is a complex back-end process requiring close coordination with BigPanda support. Reach out to us at [email protected] if you are interested in configuring your text-similarity algorithm.

The text-similarity algorithm can be adjusted to better reflect the relationship between changes and incidents. You are able to change categories, adjust weight given to specific types of matches, or remove specific categories entirely.
You are able to test algorithm changes before activating them using score reports.

👍

Rich alert and change data is vital to the success of the text-similarity algorithm. The more information available in incoming tags and description fields, the easier it is for the algorithm to spot matches.

Default Text-Similarity Settings

Main Default Categories

Category Name

tag_keys

tag_values

Default Weight

Configuration Item

"cmdb_ci", "yp_service_id", "ci", "configuration_item"

5

AWS Region

["ue1", "uw1", "uw2", "ew1", "ec1", "an1", "an2", "as1", "as2", "se1","us-east-2","us-east-1","us-west-1","us-west-2","ap-east-1",...]

1

Environment

"environment","environments","env","envs","tier","tiers","stage"

2

Team

"responsible_group", "teams", "owners", "assignment_group", "groups", "team", "owner", "group"

3

Application

"application", "services", "applications", "service", "business_service", "app", "apps"

4

Host

"hostnames", "server", "hostname", "host", "instances", "host_name", "instance", "servers", "object", "hosts", "nodes", "host_names", "node", "objects", "device", "devices"

4

Additional Default Categories

Category

Regex

tag_keys

Default Weight

IP

(?:[0-9]{1,3}.){3}[0-9]{1,3}

7

Email

(\[email protected]\S+)

2

URL

^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]

4

MAC

(([0-9A-F]{2}[:-]){5}([0-9A-F]{2}))

5

Port

number between 1 and 65535

must contain ‘port’

4

PrimaryKey

3

SecondaryKey

2

SourceSystem

3

Other RCC Input

Type

Description

Adjustment Examples

Default Setting

Change Timeframe

Maximum possible difference between alert time and change end time

For an org that follows business hours, increasing to 3 days to capture weekends

1 Day

Change Fields

Which change fields are included in the calculation

Deselect build-notes that might muddle the actual change impact

All Tags

Change Limit

Maximum number of changes that can be Suspect for one incident

Decrease limit if too many faulty suggestions are slipping through

2

Excluded Change Statuses

Exclude all changes with a specific status

Excluding In Development, Unscheduled, or other non-actionable statuses

Canceled

Minimum Score Threshold

Lowest possible score that can appear as a root cause subject

Use this score and custom weights to lock categories together
(e.g. a business unit category and the minimum score are both 30, so only changes that match the business unit will be suggested)

0

Splitters

Delimiters to split text (should prod-correlation-12 be treated as 1 whole or 3 separate)

Remove the hyphen to treat “prod-correlation-12” as a single option and narrow results

[ " ", "\n", "[", "]", "(", ")", """, "'", "*", ",", "::" ]

Stop Words

Words to exclude from correlation

Add words that are used commonly across an org to keep them from muddying results

A list of 900 words

Causal Machine Learning Algorithm

The causal machine learning algorithm is a true deep learning tool that uses complex calculations to find possible root causes based on historical data within your BigPanda instance.

By looking past simple correlations and surface similarities, the causal algorithm can go well above and beyond human or text-similarity understanding to find unexpected connections.

Causal ML builds a predictability index, or an understanding of relationships between incidents and changes using variable weighted word categories. Each new incident that enters BigPanda is compared to system changes and assigned a causation score. When a change scores above a threshold on the predictability index, the causal algorithm flags it as a potential root cause.

The causal algorithm improves and learns over time as new incidents and changes enter BigPanda. When new incidents and changes are sent to BigPanda, or RCC matches are marked in your system, the causal algorithm adjusts the predictability index, adapting and learning based on the new data. This learning happens at set times - once a month the algorithm reviews the last 3 months of data and updates the predictability index.

The predictability index is based on your organization’s data, meaning that the algorithm is unique to your instance and based on the data you have in BigPanda. This also means that historical data is required for the causal algorithm to find meaningful suggestions.

Sample Causal SuggestionSample Causal Suggestion

Sample Causal Suggestion

Advantages of the causal algorithm:

  • Functions even with poorly enriched events
  • Cumulative functionality improving over time
  • Deep insights into system relationships
    Disadvantages of the causal algorithm:
  • Slower results, no immediate suggestions
  • Not Configurable
  • Hard to see and understand reasoning

Improving the Causal Algorithm

As the causal algorithm uses deep machine learning to build the predictability index, the algorithm is not configurable. However, by improving the data available to the causal algorithm, you are able to improve the quality of causal suggestions.

To improve the quality of your causal algorithm suggestions:

  • Actively mark changes’ RCC status with active and resolved incidents. Marking suggested changes as either Suspect, Match, or None is important feedback to help the algorithm refine itself. Even marking non-matches will help the algorithm understand what connections are probable.
  • Add and configure change integrations to ensure that the causal algorithm has rich data to pull from. Including complete change details in the description field or custom tags enriches the algorithm’s ability to spot connections.
  • Configure the Text-Similarity algorithm to better reflect your team’s understanding of existing relationships between incidents and changes. Better suggested changes will improve the feedback you are able to give the algorithm.

Reporting on Root Cause Changes

If your organization has already enabled root cause changes, the BigPanda team is able to pull a change correlation report on past algorithm suggestions.

The report includes information on:

  • Incident time
  • Incident id
  • Change id
  • Entity id
  • What value triggered the match
  • Match type (text-similarity category)
  • Incident tag key
  • Change tag key
  • Time and total correlation score
  • Whether or not the suggested match was manually marked as Suspect, Match, or None.
    Contact us at [email protected] if you would like to access the change correlation report.