BigPanda’s Root Cause Changes feature collects change information through inbound change integrations and correlates changes to potentially related incidents.
Changes that are correlated strongly enough are floated up onto the Incident Overview section of the incident details as suggested related changes.
BigPanda’s Root Cause Changes feature leverages a text similarity algorithm.
The algorithm runs calculations on key connections between incidents and changes, including:
- Categories - tag key/value matches
- Time Factor - how close were the change and incident
- Alerts Coverage - how many of the alerts match properties in the change
BigPanda is configured to suggest up to 2 related changes, but only changes that are highly correlated will be suggested.
Administrators can manage the algorithm to improve the quality and frequency of suggested root cause changes.
The Root Cause Analysis Algorithm is focused on finding causation, not correlation. The causation time factor is configured to consider only changes that could have affected a system long enough to create a system event.
When calculating time ranges, the algorithm rounds start times up and end times down to the nearest hour. When searching changes based on expected matches, you may see different results than the algorithm.
The Text-similarity algorithm runs a calculation between incidents and changes to identify changes that may be related. This algorithm uses alert tags, details, and incident metadata to find common values between incidents and changes, using automation to replace the long man hours normally required for this process.
Once matches are found, the algorithm weighs the match to see if it is potentially a sign of root cause. To do so, the algorithm breaks incident data out into categories, or types of connection. Each category has a particular weight assigned to it, reflecting the type of relationship between changes and incidents, as not all text-matches point to shared system attributes or resources.
For example, the “IP” category of tags, with data on IP addresses is generally assigned a high weight as changes and incidents occurring at the same IP address are likely to be connected.
The algorithm then calculates a causation score based on the time frames of the incident and change, and the strength and number of matches between the two. Changes with a high causation score are surfaced as suggested related changes in the Incident Details pane.
Adjusting the text-similarity algorithm is a complex back-end process requiring close coordination with BigPanda support. Reach out to us at [email protected] if you are interested in configuring your text-similarity algorithm.
The text-similarity algorithm can be adjusted to better reflect the relationship between changes and incidents. You are able to change categories, adjust weight given to specific types of matches, or remove specific categories entirely.
You are able to test algorithm changes before activating them using score reports.
Rich alert and change data is vital to the success of the text-similarity algorithm. The more information available in incoming tags and description fields, the easier it is for the algorithm to spot matches.
Main Default Categories
|Category Name||tag_keys||tag_values||Default Weight|
|Configuration Item||"cmdb_ci", "yp_service_id", "ci", "configuration_item"||5|
|AWS Region||["ue1", "uw1", "uw2", "ew1", "ec1", "an1", "an2", "as1", "as2", "se1","us-east-2","us-east-1","us-west-1","us-west-2","ap-east-1",...]||1|
|Team||"responsible_group", "teams", "owners", "assignment_group", "groups", "team", "owner", "group"||3|
|Application||"application", "services", "applications", "service", "business_service", "app", "apps"||4|
|Host||"hostnames", "server", "hostname", "host", "instances", "host_name", "instance", "servers", "object", "hosts", "nodes", "host_names", "node", "objects", "device", "devices"||4|
Additional Default Categories
|Port||number between 1 and 65535||must contain ‘port’||4|
Other RCC Input
|Type||Description||Adjustment Examples||Default Setting|
|Change Timeframe||Maximum possible difference between alert time and change end time||For an org that follows business hours, increasing to 3 days to capture weekends||1 Day|
|Change Fields||Which change fields are included in the calculation||Deselect build-notes that might muddle the actual change impact||All Tags|
|Change Limit||Maximum number of changes that can be Suspect for one incident||Decrease limit if too many faulty suggestions are slipping through||2|
|Excluded Change Statuses||Exclude all changes with a specific status||Excluding In Development, Unscheduled, or other non-actionable statuses||Canceled|
|Minimum Score Threshold||Lowest possible score that can appear as a root cause subject||Use this score and custom weights to lock categories together|
(e.g. a business unit category and the minimum score are both 30, so only changes that match the business unit will be suggested)
|Splitters||Delimiters to split text (should prod-correlation-12 be treated as 1 whole or 3 separate)||Remove the hyphen to treat “prod-correlation-12” as a single option and narrow results||[ " ", "\n", "[", "]", "(", ")", """, "'", "*", ",", "::" ]|
|Stop Words||Words to exclude from correlation||Add words that are used commonly across an org to keep them from muddying results||A list of 900 words|
If your organization has already enabled root cause changes, the BigPanda team is able to pull a change correlation report on past algorithm suggestions.
The report includes information on:
- Incident time
- Incident id
- Change id
- Entity id
- What value triggered the match
- Match type (text-similarity category)
- Incident tag key
- Change tag key
- Time and total correlation score
- Whether or not the suggested match was manually marked as Suspect, Match, or None.
Contact us at [email protected] if you would like to access the change correlation report.
Learn more about BigPanda's Incident Intelligence
Dig deeper into Correlating Changes with Incidents
Begin integrating Change Integrations
Updated 23 days ago