Manage the Root Cause Changes Configuration
Adjust the Root Cause Changes default configuration to improve match suggestions.
Advanced Insight Module
This feature is part of the Advanced Insight Module. If your organization has not purchased this module, you may not have access to the feature.
If you are interested in upgrading to the Advanced Insight Module, contact your BigPanda account team.
BigPanda’s Root Cause Changes (RCC) feature compares your changes with active incidents in real-time, so your team doesn’t have to dig through hundreds or thousands of potentially related change events.
Changes that are suspected as a potential root cause are flagged and added to the incident details. By default, BigPanda will suggest up to 5 related change suspects in the incident console.
RCC uses an algorithm based on natural language processing and vector space models to compare change data to active incidents. This allows BigPanda to confidently identify connections between alerts and changes, making sure that suspected changes are statistically relevant.
RCC works best when it has rich data and meaningful relationships identified for your organization. The first step in improving your RCC suspects is to improve the quality and consistency of the data being considered.
Before making any changes to RCC, review your tags configuration to make sure that you’re sending in rich contextual tags and that incoming tags are as standardized as possible. This provides RCC with high quality enrichment and tag normalization, which will make it easier for BigPanda to accurately spot causality.
BigPanda University Advanced Insight Module course
You can learn more in the BigPanda University Advanced Insight Module course.
By the end of the course you will be able to:
- Generate and review AI Analysis reports for your incidents that will help you reduce MTTR.
- Review root cause change suspects and identify changes that are the root cause of incidents.
- Compare incidents with similar characteristics to enhance context and resolve incidents faster.
Click here to enroll.
Refining results
For even more refined results, you can request modifications to your RCC category and parsing configuration. This is a complex back-end process requiring close coordination with BigPanda support. Reach out to us at [email protected] if you are interested in adjusting your RCC configuration.
Tips for Adjusting the RCC Configuration
Root Cause Changes (RCC) is a complex calculation, dependent on many external factors. Careful preparation is key to seeing accurate change data in your results.
When adjusting the RCC configuration, take the following steps to ensure you receive high-quality results:
- Set measurable goals - Consider whether you want to see more suggestions, additional incident coverage, or more accurate results.
- View available analytics - Use the Suspected Changes Analysis dashboard in Unified Analytics to see how RCC is performing. You can also compare data within your change tools to get a more complete picture.
- Identify problem areas - Look for missing enrichment or differences in the way change and alert tags are used. You can review historical data to find changes that should have been marked suspect, but weren’t.
- Implement improvements in small, manageable chunks - It’s easier to understand impact when it’s isolated. This also makes setting a backup and rollback plan easier. Rather than making sweeping changes in your systems and BigPanda, plan smaller modifications with clear goals.
Relationships between changes and incidents are not always straightforward. An incident can be tied to multiple changes, a single change can cause multiple incidents, or there may be an indirect or unintuitive connection between changes and incident root cause. RCC navigates this complexity by identifying a small set of highly likely suspects. This narrows the search for root cause, while making sure no edge case changes are missed.
Configuration Elements
Root Cause Changes (RCC) runs calculations on key connections between incidents and changes. Each incident-change match is given a causation score based on these calculations, with a higher score indicating a more likely suspect.
- General settings such as:
- Time Frame - how close were the change and incident
- Alerts Coverage - how many of the alerts match properties in the change
- Categories - rules for weighting and parsing tag matches
Adjustments to general settings like minimum score and time frame may have unexpected, sweeping effects on RCC results, while changes to category will have effects on a narrower portion of results.
General Configuration Settings
Change with caution
Adjustments to general settings may have unexpected, sweeping effects on RCC results.
Change Limit
Once a change has been identified as a potential match, RCC will calculate the match score to determine how likely that match is. To prevent operator noise, only changes that have a high match score will be surfaced in the UI.
By default, 40 matched changes will receive a match score.
Suspect Duplicate Limit
Multiple changes may be directly related, creating noise for operators trying to identify the relationship between incidents and changes. To limit a single noisy change project from hiding other potential matches, RCC limits the number of suspected duplicate changes that are considered.
Changes that match several categories are considered suspected duplicates.
By default, 15 suspected duplicate changes will be calculated as potential root cause changes for an incident.
Exclude Change Statuses
The Excluded Change Statuses setting allows you to exclude all changes with a specific status. For example, you can choose to exclude all Planned changes, or other non-actionable statuses. Status options are Planned, In Progress, Done, and Canceled.
The status Canceled is excluded by default.
Minimum Score Threshold
The Minimum Score Threshold setting allows you to adjust the lowest possible score that can appear as a root cause subject.
You can use this score and custom weights to lock categories together. For example, if a business unit category and the minimum score threshold are both 30, only changes that match the business unit will be suggested.
The default Minimum Score Threshold is 3.1.
Time Frame
RCC is focused on finding causation, not correlation. Only changes that have been implemented long enough to create a system event are considered as potential causes.
In addition, changes that happened too far before the incident are also excluded, as incidents usually happen shortly after system changes.
The Time Frame setting allows you to configure the maximum possible difference between the alert time and the change end time. For example, if your organization follows typical business hours, you can increase your time frame to 3 days to capture weekends.
The default Time Frame is 36 hours.
We do not recommend setting a Time Frame longer than 7 days. Even with long-running changes, the probability of accurate correlation is reduced dramatically with longer time frames.
Stop Words
You can use stop words to exclude certain words or phrases from matching.
Excluding useless words and phrases improves RCC accuracy. We recommend adding words or phrases that are used commonly across your organization to avoid cluttering your results.
For example, the phrase “business risk” is often used in change descriptions, but may not add useful value to the RCC calculation. This phrase can be included in the list of stop words.
You can add individual words to the stop words list, but be cautious. Single words can also prevent correlation on longer phrases containing the words. The word “deployment” may often be included in change descriptions, but adding this to the stop word list may also stop matches to “Banking Deployment System.”
As a general rule, stop phrases are more accurate than singular words.
If you are unsure if a stop word or phrase will cause issues matching other tags, it is better to leave it out.
The default stop words list has over 1500 common words configured. You can add, remove, or change words to better fit your system.
Splitters
Splitters are delimiters that split text, allowing you to control how a phrase, sentence, or single tag is broken out into individual tokens. Adding or removing splitters allow you to break up or group text from single alert values.
Splitters lists are defined for the entire algorithm, but each category has a specific splitter list applied. You are able to define new splitter lists to better fit the formatting of the tags in categories.
Type | Example | Default Setting |
---|---|---|
Standard Splitter List | prod correlation 12 will be treated as 3 separate items | Default Splitters: [ " ", "\n", "[", "]", "(", ")", """, "'", "*", ",", "::", "|" ] |
Standard No-spaces Splitter List | prod correlation 12 will be treated as 1 single item | Default Splitters: [ ",", "|" ] |
Filters
The filters below are available to narrow RCC matches.
Type | Description | Adjustment Guidelines | Default Setting |
---|---|---|---|
Min Size | Minimum number of characters in a token | 3 | |
Max Size | Maximum number of characters in a token | 120 | |
Allowed Categories | Which categories to refer to during matching | Remove a category to maintain the settings, but stop using it to match changes | All categories |
Category Configuration Changes
Root Cause Changes (RCC) uses change details, alert tags, and incident metadata to find common values between incidents and changes.
To define the context and relationships between data, RCC breaks incident data into weighted categories. Different weight and parsing rules apply to tag matches in each category, making sure that matches reflect the relationship of shared system attributes and resources.
Categorization defines which tags should be grouped together for weighting and calculation. As every organization organizes and leverages their tags differently, RCC categories may have been configured during onboarding.
These categories can be further adjusted to improve RCC results. You are able to:
- Enable and disable splitters
- Adjust category weights
- Add or remove tags
- Add or remove categories
Review tags
RCC leverages the tags configured in your system, which means tag changes can have major downstream effects on RCC efficacy. Before adjusting any RCC categories, we recommend reviewing your tags to ensure the quality and consistency of the data being considered.
Splitters
Splitters are delimiters that split text, allowing you to control how a phrase, sentence, or single tag is broken out into individual tokens. Adding or removing splitters allow you to break up or group text from single alert values.
The list of splitters is defined for the entire algorithm, but splitters are enabled per category. You are able to enable or disable splitters for individual categories to better fit the formatting of the tags in the category.
Adjust Weight
Each category is assigned a weight. This weight enables BigPanda to mark the importance of tag connections between incidents and changes.
Weighting may not mirror the importance of tags in your incident triage and resolution process. Instead, RCC category weights should reflect the system relationships that point to causality for the specific scenarios when incidents are caused by changes.
Increasing a category weight will increase the chance that tag matches in that category will result in a suggested cause. Decreasing it will deprioritize those matches.
We recommend keeping all weights above 0. If you wish to prevent tags from matching, it is better to remove them from the configuration.
Weighting categories
We do not recommend overweighting certain categories to force them to always show up in the results. This reduces the overall effectiveness of the matching system and adds risk of both false matches and missed suspects. Weights should reflect the significance of matches, but all included categories are important to identifying the actual root cause of incidents.
If you find poor matches muddling your results, it is better to adjust the tags in those categories to improve results.
Add or Remove Tags
The tags configured in each category are based on default tags, but may have been adjusted during Onboarding to better reflect your system configuration.
Each category implies relationships between the tags listed within them. Add or remove tags in individual categories to make sure that all related tags are grouped together.
Categories may leverage tag names or tag values. When deeper granularity is needed, tag values allow you to limit matching to only the values that represent potential change causality.
Add or Remove Categories
The default RCC categories are based on best practices and system commonalities across all of BigPanda’s customers. As a result, these categories may be too broad to capture the nuances of an enterprise organization’s system.
You are able to split or add categories to add granularity and identify relationships within your system. Splitting or adding categories improves accuracy and adds control for other category settings such as splitters or weights.
As you restructure your categories, you may find a category is not helpful for RCC matching within your unique system. These categories can be removed completely. Use caution in removing categories as these may cause visibility gaps into change-related incidents.
Adding categories for improved accuracy
A healthy RCC Configuration in BigPanda includes many categories to reflect the complexity of change-related incidents.
Default Root Cause Changes Categories
Category Name | Description | Default Weight |
---|---|---|
Configuration Item | Alert tags related to configuration item entities stored in an external CMDB. | 10 |
Host | Alert tags related to host names, fully qualified domain names, system names, device names, appliance names, etc. | 9 |
IP | Leverages a specific regex formula to create tokens from IP-related alert tags: (?:[0-9]{1,3}.){3}[0-9]{1,3} This category cannot be modified. However, if you have a need to include IPV6 addresses, you can create a custom category or use the host category as needed. If you use the host category, be aware that the splitters defined for Host may not apply well for IP addresses. | 9 |
Application | Alert tags related to application names and/or functional designation names that aren’t service names. | 8 |
Services | Alert tags related to services, business services, service providers, etc that aren’t application names. | 8 |
MAC Address | Leverages a specific regex formula to create tokens from MAC Address alert tags: (([0-9A-F]{2}[:-]){5}([0-9A-F]{2})) | 8 |
Cluster | Alert tags related to cluster names, cluster VIP and cluster hostnames, cluster IDs, resource pools, etc. | 6 |
Queue Detail | Alert tags related to message queues, queue managers, etc. | 5 |
Job Detail | Alert tags related to job names, job scheduler information, job identifiers, etc. | 5 |
Data Component | Alert tags related to databases, database instances, database component services, listener names, data warehouse identifiers, etc. | 5 |
Data Center | Alert tags related to data center logical identifiers. It is best practice to use this category for different identification data than physical location data. | 3 |
Location | Alert tags related to physical locations, mailing addresses, building numbers, city, rack or row information, etc. | 3 |
Instance Detail | Alert tags that reference instance level information. Note: Some systems use instance in place of host, application, service, system property, etc. Use this category with awareness of all alert sources. It’s important to identify the type of information that you can expect to be received via this category to determine whether or not you should include it in your change correlation strategy. | 3 |
Object Detail | Alert tags related to objects, host and/or application level components, etc. This may be vaguely defined across different alert sources. It’s important to identify the type of information that you can expect to be received via this category to determine whether or not you should include it in your change correlation strategy. | 3 |
URL | Leverages a specific regex formula to create tokens from URL alert tags: ^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_ It may also leverage tag names that match this formula: !:,.;]*[-a-zA-Z0-9+&@#/%=~_] Note: You may need to adjust maximum token length to extract values from an URL. | 3 |
Process Detail | Alert tags related to process names, process details, process identifiers, etc. | 2 |
Categorization | Alert tags related to categorization information that can be used for correlating changes to incidents. Be careful when using this category and ensure your targeting makes sense across all of your inbound alert sources. | 2 |
Group Detail | Alert tags that constitute logical groups of information, such as host groups, service groups, application groups, etc. Note: This is not the same as the Support Group/Team category. | 2 |
Environment | Alert tags related to environment, stage, and/or support tier. | 2 |
Support Team | Alert tags that identify the support and/or escalation teams related to an incident. You may have various teams that handle implementation, support, and escalation of an incident. Use this category with caution and disable or reduce weighting if necessary. | 2 |
Leverages a specific regex formula to create tokens from Email alert tags: (\S+@\S+) Due to the volume of changes created in ITSM systems, correlating email addresses may cause inaccurate results. We recommend keeping the weight below the rcc-minimum-score-threshold so that this category isn’t the only correlator. | 2 | |
Line of Business | Alert tags that identify the line of business, department, organizational designator, business segment, impacted business identifier, etc. | 1 |
Domain Detail | Alert tags that identify domain names, domain identifiers, and/or domain properties. | 1 |
Owner | Alert tags related to ownership details. It is best practice to use this category for different ownership types than support and escalation tags. | 1 |
AWS Region | All common AWS region tags. | 1 |
Device Detail | Alert tags that provide increased accuracy from defined device details, such as description fields passed from alert sources or enriched details from external systems. We recommend keeping the weight low for this category to provide a mechanism for increasing accuracy. The default tags are provided as a reference for potential usage, but you can modify this category to improve change correlation accuracy. | 1 |
Next Steps
Learn more about BigPanda's Incident Intelligence
Dig deeper into Correlating Changes with Incidents
Modify your Change Integrations
Updated 6 months ago