Remediate Incidents
Remediation steps enable you to communicate with your team and quickly resolve incidents within BigPanda.
After an incident has been investigated and triaged, remediation steps allow you to quickly resolve the incident.
During the remediation process, you can:
- Add comments, allowing other team members to view information about steps taken on the incident.
- Add incident tag values to enrich the incident with additional context.
- Correlate changes with incidents to help determine the root cause of an issue.
- View Similar Incidents to identify past incidents with similar characteristics to help you enhance context and find a resolution.
- Search for incidents within BigPanda.
- Resolve the incident within BigPanda.
For more information about the incident management process, see the Incidents in BigPanda and Triage Incidents documentation.
Comment on Incidents
Collaborate with team members by viewing and contributing to comments on an incident.
- Within an incident or in the incident details pane, click the Comments icon.
- Add a comment or view previous comments from your colleagues.
- (Optional) Search for a specific comment using the Search bar at the top of the incident feed or in BigPanda's Search tab.
Character limit
The total length of comments for a single incident cannot be over 100,000 characters. Additional comment characters may be trimmed or have unreliable functionality.
Incident Tags
Incident tags add context and details to your BigPanda incidents, enabling better correlation and faster troubleshooting.
BigPanda takes the raw data from your systems and normalizes it into key-value pairs, called tags. Each tag has two parts: the tag name and the tag value. Tags are the fundamental data model for your alerts and incidents and provide vital incident enrichment.
In BigPanda, tags enable alert correlation, provide incident information in the UI, and help you configure environments, perform searches, collect analytics, and configure AutoShare for certain integrations.
Incident tags summarize information for a particular incident so you don't have to review all of the related alerts.
Relevant Permissions
Roles with the following permissions can access Incident Tags:
Role Name | Description |
---|---|
Environments | View, create, edit, and delete Environments in the UI and view the incidents within them. |
Incident Enrichment | View, create, and edit Incident Tags in BigPanda Settings. |
Permission access levels can be adjusted by selecting either View or Full Access. To learn more about how BigPanda's permissions work, see the Roles Management guide.
Use Incident Tags
Incident tags will appear on both the Incident Feed and in the Overview tab of the Incident Details pane.
Information about the user who edited the tag, and the time and date of the change can be accessed by hovering over the name of the incident tag. Tags that have not been manually edited will show the last date and time that automatic incident enrichment occurred.
You can manually assign or remove incident tags. To learn more about using incident tags with incidents, please see the Add or Edit Tag Values and Prioritize Incidents documentation.
You are able to create, edit, or inactivate incident tags to fit the needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.
Incident tags may also be configured to automatically add values to specific incidents based on incident or alert criteria. To learn more about configuring automatic tags, please see the Automatic Incident Tags documentation.
Manually splitting incidents
If an incident has been manually split, the new incident will be created without any incident tag values. If incidents are manually merged, only the incident tags from the destination incident will appear on the merged incident. Source incident tag values will not be added to the destination incident.
Incident Tag Types
Incident tags may take the form of Priority, Free Text, or Multi-value tags.
Priority Tags
By default, your environment will have Priority tags enabled, with pre-configured settings. These settings can be customized to better fit the needs of your organization. To learn more about customizing tags, please see the Manage Incident Enrichment documentation.
Priority tags are visible at the top left of incidents in both the feed and the details Overview tab, next to the incident severity. Incidents that have not been prioritized will not show the priority icon.
Priority can be assigned from the incident feed or from the Overview tab of the incident details pane. To learn more about using priority tags, please see the Prioritize Incidents documentation
Free Text and Multi-value Tags
Free text and multi-value text tags add additional context, details, or resources to your incidents. Each tag is made up of a name-value pair similar to BigPanda alert tags.
Free text and multi-value text tags appear at the top of the Overview tab of the incident details pane.
Each tag is made up of the name of the tag, and the tag value (for example: Source System: Nagios). For free text tag types, the value is a single text string. For multi-value text tag types, the value is one or more individual text tags. These appear as individual items beside the tag name.
The incident tags available in the Incident Details pane are configured by your BigPanda admin. To learn about creating incident tags, see the Manage Incident Enrichment documentation.
Closed-List Tags (Single-Value and Multi-Value List)
With closed-list tags, you can add or update values for an incident tag by selecting one or more values from a list defined by your BigPanda administrator.
There are two closed-list tag types: single-select and multi-select. Single-select tags allow you to select one value from a closed list, while multi-select tags allow you to select multiple values.
To learn about creating closed-list incident tags, see the Manage Incident Enrichment documentation.
Add or Edit Tag Values
Incident tags may already be populated with automatic tag values, or they may be empty. You’re able to add or edit tag values to update the incident with the latest information about the ongoing issue.
Editing tags
Not all tags can be edited. If a tag cannot be manually changed, a lock icon will appear to the right of the incident tag.
To add or edit tag values:
- In the Incidents tab, select an incident you wish to review.
- In the Incidents Details pane navigate to the Overview tab to view incident tags.
- Select a tag you wish to edit and click the pencil icon.
- In the Edit Incident Tag pop-up, enter the appropriate values in the editable field.
- Click Update to save.
Automatic enrichment
Manually changing tag data will stop automatic enrichment for the tag.
Tags may be single- or multi-value, and may have a typing field or preset dropdown.
Text tags allow you to include a single text string up to 256 characters. In typing field tags, type the full string of the value. For preset dropdowns, type to filter, and select the item from the list.
Multi-value text tags allow you to add several individual text strings. In typing field tags, type each value, and tap Enter or Tab on your keyboard to add another value. For preset dropdowns, type to filter, and select the desired items from the list.
Closed-list tags can be single-select or multi-select and will have a preset dropdown. For single-select tags, select one item from the dropdown. For multi-select tags, select any number of items from the dropdown. For either tag type, you can type to filter the dropdown list.
Configuration changes when editing or deleting tag values
Incident tag values are leveraged in many key features of BigPanda. Alert correlation, incident information in the UI, environments, analytics, saved searches, and AutoShare for certain integrations rely on incident tag values for configuration.
Changing what incident tag values are applied to incidents may have negative effects on dependent configurations. Be sure to understand downstream implications before making changes to incident tag values.
Manage Incident Tags
Incident tags can be configured to fit the specific needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.
Correlate Changes with Incidents
Identifying the root cause of an outage or a poorly performing application is one of the biggest challenges that IT organizations face today, and the fast-changing nature of modern dev only makes that more difficult.
BigPanda’s Root Cause Changes (RCC) feature simplifies this process by collecting change data right into the incident console and then leveraging BigPanda’s algorithms to identify changes that may have led to incidents.
Advanced Insight Module
This feature is part of the Advanced Insight Module. If your organization has not purchased this module, you may not have access to the feature.
If you are interested in upgrading to the Advanced Insight Module, contact your BigPanda account team.
Use the Changes section and suggested matches to search and mark suspect changes, and collaborate with other users to investigate which change caused the incident.
Marking changes as the suspected or matched root cause change of an incident is a vital tool in identifying historically problematic changes.
See Incidents in BigPanda for more information about the Changes tab.
Search For Specific Changes
The change table displays changes that are active during a specified time frame. Changes are considered active if they:
- Start within the specified time frame
- End within the specified time frame
- Start before and remain active after the specified time frame
You can search the change table for changes that meet specific criteria and fall within a selected time frame. Use the search bar orthe time frame selection tool at the top of the change table to find specific types of changes.
Select a Time Frame
By default, the Change Table displays changes that were active in the one hour before the incident started. You are able to select a different time frame from a set of options, or select a custom date range.
- To change the change table’s time frame, click the current time frame. From the dropdown, select the desired time window
- To filter by specific dates, select Custom Dates Range and enter the relevant dates and times in the dialogue box
Change Time Frame Rules
Calculating time ranges
When calculating time ranges, the root cause change algorithm rounds start times up and end times down to the nearest hour. When searching changes based on expected matches, you may see different results than the algorithm.
Search Changes
Changes can be searched using BigPanda Query Language (BPQL) to find specific tag values. Use free text, tag values, boolean, or regex queries to narrow the list of changes to only those that meet that requirement.
When searching for incidents using BPQL, the Query Assist feature is available to help you build a query. See Query Assist for more information.
View RCC Details
The change table displays details about each change associated with the incident, including status, summary, start and end time, change suspect score, and more.
The Root Cause column within the change table lists whether a change has been identified as a suspected or matched cause of the incident.
If BigPanda automatically set the RCC status, the purple star icon will appear in the user field.
Click the arrow icon to open the Change Details panel where you can see additional details about why BigPanda suspects the change is the potential root cause of the incident.
Mark Changes
Mark changes as suspected or matched root causes to record the connection between the change and incident for your team, analytics, and BigPanda’s algorithm.
Suspect or match
Marking changes as Suspect or Match is a vital tool in recognizing patterns between incidents and changes in your system.
All changes can be marked with 1 of 3 statuses related to an incident. Select the status that best describes the change’s relationship to the cause of the incident:
- None - The change is likely not the cause of the incident. This is the default RCC status.
- Suspect - The change may have been related to the cause of the incident. If BigPanda’s RCC algorithms believe there is a strong connection between a change and incident, it will automatically mark the change as Suspect.
- Match - The change is likely the cause or related to the cause of the incident. BigPanda will never automatically mark matches - changes can only be marked as a match by a human teammate.
To set a root cause status:
- In the Change table, click the Root Cause dropdown for the change.
- Select the desired status.
- Click the status to enter a comment to add details or reasoning to the RCC status.
Comments
Comments can be a great foundation for collaboration and post-outage review.
When a different status is selected for a change, a record of the activity is created with information about who set the change status, when, and any comment associated with it.
In addition, activities related to RCC (ie: type and time of correlation, comments, latest interaction, etc.) are listed in the activity feed.
Suspected Changes
BigPanda's algorithms automatically detect connections between changes made to the system and incidents.
As new incidents are created or new alerts join an existing incident, BigPanda calculates their match potential with each past change.
If a change with high match potential is found, BigPanda marks the change as Suspect and adds a comment to the Change Details panel explaining why the change was marked. Suspected changes will appear on the Overview tab as well as at the top of the Changes table. Filter the table to show only suspected and matched changes by clicking the Show potential RCC only toggle.
Suspected changes can rapidly speed up the root cause investigation process by identifying potential problems right at incident detection.
Suspect changes
BigPanda will only mark changes as Suspect (not Match) to give users the final say on whether the change is the root cause of the incident.
Similar Incidents
Advanced Insight Module
This feature is part of the Advanced Insight Module. If your organization has not purchased this module, you may not have access to the feature.
If you are interested in upgrading to the Advanced Insight Module, contact your BigPanda account team.
Past incidents are a valuable source of information during the incident management process. Reviewing incidents with similar characteristics can help you understand recurring issues and accelerate the resolution process.
You can use the Similar tab within the incident details pane to identify incidents with matching characteristics. Each similar incident includes details such as impact, assignment, steps to resolve, and a summary describing why the incident was considered similar.
Similar incidents are chosen based on a similarity score. The score is calculated using similarity categories based on entity, problem, impact, and topology. A list of the incidents most similar to the selected incident is generated each time you access the Similar tab.
See the Similar Incidents documentation for more information.
Search for Incidents
In BigPanda, you can use the Unified Search tab to investigate current and historical incidents across all of your integrated monitoring systems, which can help you find, solve, and prevent problems. Enter a keyword search or query, and apply filter criteria to narrow results. BigPanda finds all incidents with alerts that match your search criteria.
Use Unified Search
To search for incidents:
- At the top of the screen, click the Search tab.
- Enter a keyword search (term or exact phrase in quotes) or a query in BigPanda Query Language (BPQL). Both keyword search and BPQL support regular expressions.
- Select the filter criteria or leave the default settings.
- Click the search icon or press Enter. The results page shows the total number of matching incidents and lists up to the first ten matches.
- (Optional) Scroll down to view more results.
Tag names in different monitoring systems
If you don't see the results you were expecting, try adding OR conditions in your BPQL query to include similar values that may have different tag names in different monitoring systems (ie:
host
in Nagios andobject
in SolarWinds).
Search Options
A keyword search looks for a value in descriptions, source system names, and in any tag—in contrast to a BPQL query, which looks for a value in a specific tag. For tags that contain multiple values, a keyword search will return a match if any of the values match the search term.
Examples of Keyword Search queries:
- Use an asterisk as a wildcard to match multiple values with a common element, ie:
phx*db
- Add Quotes (" or ') around an exact phrase that contains spaces (spaces are allowed only between quotes), ie:
"CPU over 90*"
- Add a slash (/) as the first and last character to search for values that match a regular expression (case sensitive and limited to 32,000 characters), ie:
/...Phx[0-9]{3}/
Keyword searches
Keyword searches can find exact search terms between special characters without using wildcards. For example, if you search for
api
, it matches all tags, source system names, and descriptions whereapi
is present between special characters, such asprod-api-1
andweb-api
. You do not need to use wildcards; for example,_api_
.
BPQL queries are in <tag>
<operator>
<value>
format, using AND
, OR
and parenthesis to separate and/or prioritize multiple queries.
For example: host=srv-1 AND (check=chk-1 OR (check=chk-2 AND status=critical))
Search Incident Assignments
You can use Incident Assignments in Unified Search to search current and historical incidents. Search for either the assignee or the user who last changed the assignment (the assigner).
On the Search tab, enter a keyword search for the user's email address or type a more complex query using BigPanda Query Language (BPQL).
- For incidents assigned to a specific person, use the format: assignee =
- For unassigned incidents, use: *assignee != **
- For incidents where a specific person changed the assignee, use the format: assigner =
Search Specific Tags
BigPanda normalizes alert data into attributes called tags. Use BPQL to search for values in any standard or custom tag and to create advanced queries. As you type, the search bar displays suggested tags and monitoring system names that are relevant to your search.
Search with incident tags
Incident tags and some incident metadata can be used to search and filter incidents. Standard incident tags that can be searched include
source_system
,status
,assignee
,assigner
,severity
,zero_impact
,is_active
,comment
. See the Tag Naming documentation for a list of system limitations tied to specific tags.When searching or defining BPQL conditions using incident tags, you must use the Incident Tag ID, not the incident tag name.
To see the incident tag ID, click an incident tag name in Query Assist. The tag ID is then populated in the search bar or input field. You can also find the Incident Tag ID on the Settings > Incident Enrichment screen in the incident tag details pane.
To search using an incident tag, the tag ID must be preceded by
incident
. (For example,incident.runbook
)
Filter Search Results
You can apply any of these filter criteria to narrow the results of your searches:
- Select the Environment.
- Select the source. You can include all results from a source type (such as Nagios or New Relic). Or, you can include results only from a specific instance of the source type (for example, Nagios-US-EAST1).
- Select a timeframe, or select Pick Date Range to enter specific dates and times.The results will include all incidents that were created, updated, or ended in the time frame. An event sent to BigPanda that is deduplicated and correlated into an alert will not be included unless it also includes a state change.
Default Filter Criteria
By default, search results display incidents in All Environments, from All Sources, that were active during the Last 7 Days. If you selected custom criteria, you can click Reset Filters to return to the default filter criteria.
Sort Search Results
You can change the sort order so that the results you want to see most are listed first. By default, incidents are listed in order by when they were last changed, with the most recently changed incident on top.
- On the top right of the results, click the Sort menu.
- Select the desired sort option:
- Last Changed - Time of the last change to the incident
- Status - Current status of the incident (Critical, Warning, Resolved or Acknowledged)
- Created - The time the first alert in the incident was received
- No. of Alerts - Number of active alerts in the incident(s)
Review Search Results
The search results show basic information about incidents with matching alerts, including:
- Incident title and subtitle.
- Number of active alerts.
- Source system.
- Current status.
- List of the alerts that the incident contains, along with a timeline of status changes for each alert.
- Number of shares per incident.
- Number of comments per incident.
Searching for Comments
When searching for specific comments, the search results show all the information for each associated incident, not just the relevant comment. Click on the incident’s Comments icon to view the comments containing your search term(s).
Recent Comments Only
You can search the most recent 2,000 comments on an incident in the UI. Older comments will not be returned via search. To retrieve the full comment history for an incident, view the incident activity feed in the BigPanda console, or using the Get Activities API.
Use the Timeline
The timeline shows the time frame for the filter criteria, highlighted in blue. It also shows the time when the first alert was received (incident start time) and the time when the incident was resolved (incident end time) or the current time if the incident is still active.
- To see the complete details for an alert at any point in its life cycle, click a dot on the timeline.
- Click the arrows to step through the details of every status change for the alert.
- To collapse the list of alerts and the timeline, click the arrow beside the row.
Link to Unified Searches
When you use Unified Search, the search parameters are appended to the URL of the results page. This feature allows you to share a link to your search with your teammates or save a commonly run search as a bookmark.
To link to a Unified Search:
- At the top of the screen, click the Search tab.
- Run a search and apply filter criteria, as desired.
- To share the link, copy the URL in the browser address bar and send it to the desired recipients.
- To save the link, add a bookmark in your browser to the search results page.
View search results
You must be logged in to BigPanda to view search results.
Search URL Parameters
BigPanda search URLs use standard formatting for each query parameter. This allows you to build a custom search query right in the URL.
Syntax
The overall query syntax is built using parameter value pairs:
https://a.bigpanda.io/#/app/investigator?query=phx*db&source=nagios.*&timeframe=-1h
For example, a search for all Nagios events containing the value phx*db that were active at some time within the last hour, would have this URL:
https://a.bigpanda.io/#/app/investigator?query=phx*db&source=nagios.*&timeframe=-1h
Syntax rules
- Use URL encoding to escape spaces and other unsafe characters.
- Parameters and values are case sensitive.
Parameters
Parameter | Description | Example |
---|---|---|
query | Keyword or query in BPQL | query=host%3Dphx_db%20AND%20check!%3D_cpu* |
environment | BigPanda Environment filter | environment=production |
source | Source type or integration filter, in the following format: `.`: all integrations of the same source type • .: a specific integration | source=nagios.* source=nagios.nagios_us_east1 |
timeframe | Time frame filters: -1h: last hour -2h: last 2 hours -6h: last 6 hours -24h: last 24 hours -7d%2Fd: last 7 days from, to: custom time frame with specific start and end times, in Unix Epoch timestamp (milliseconds) format. | timeframe=-7d%2Fd from=1456610400000&to=1458684000000 |
Environment and source values
The values for the
environment
andsource
parameters are unique, internal names that may be different from the descriptive names shown in the UI. If the URL parameters are not returning the expected results, try adjusting the filters on the Search tab and note the parameter values in the URL.
Resolve Incidents
Most BigPanda incidents will resolve automatically when all alerts within the incident are marked OK by the monitoring system. If an alert never receives an OK status from the monitoring system, the incident will remain open within BigPanda.
If an incident is tied to a resolved issue, but has not been resolved in BigPanda, you can manually resolve incidents within BigPanda. Resolving incidents keeps your BigPanda dashboard clean and keeps your team focused on active issues.
Incidents can be resolved from the Incident Feed or the Incident Details pane.
To resolve an incident:
- Select the incident.
- Click the Resolve incident checkmark icon in the top right of the incident feed or incident details pane.
- (Optional) Add a note to let your team know why you are resolving this incident.
- Click Resolve.
The incident will be resolved in BigPanda, updating any share recipients of the new status, and adding a Resolved Manually note to the activity log.
If any of the alerts within the incident are reopened, the incident will also reopen as normal. Learn more about what triggers incidents and alerts to reopen in the Incident Lifecycle documentation.
Resolve Alerts
Most BigPanda alerts are resolved automatically through an OK update from the monitoring system that created the alert.
If an alert isn’t resolved automatically, you can manually resolve alerts. Resolving alerts enables you to remove outliers, clean up outstanding events, and ensure that BigPanda incidents best match the real state of ongoing issues.
Alerts can be resolved individually or in bulk from the incident details pane on the Incidents tab.
To resolve an alert:
- Select the incident that has the alerts to be resolved.
- In the incident details pane, locate the alert(s) to be resolved in the Active Alerts section of the Overview tab, or on the Alerts tab.
- Use the selection boxes to select the alert(s) to be resolved.
- Click the Resolve Alert icon on the top right of the alerts table.
- (Optional) Add a note to let your team know why you are resolving this alert.
- Click Resolve.
The alert(s) will be resolved in BigPanda, and the activity log will show the alert(s) as Resolved Manually. If the alert resolution changes the status of the incident, shared recipients will be updated of the new status. If the alert was the only open alert for the incident, the incident will resolve as normal.
If the monitoring system sends an update that would reopen the alert, the alert and any related incidents will reopen as normal. Learn more about what triggers incidents and alerts to reopen in the Incident Lifecycle documentation.
Batch Alert Resolution API
Alerts can also be resolved through the Batch Alert Resolution API
Next Steps
Start Triaging Incidents in BigPanda
Learn more about Navigating the Incidents Tab
Dig into The Incident Life Cycle
Updated 3 months ago