Remediate Incidents

Remediation steps enable you to communicate with your team and quickly resolve incidents within BigPanda.

After an incident has been investigated and triaged, remediation steps allow you to quickly resolve the incident. The remediation process can help your team see what happened within the incident, allowing you to take the steps needed to prevent further events from the same tools.

During the remediation process, you can:

For more information about the incident management process, see the Incidents in BigPanda and Triage Incidents documentation.

Comment on Incidents

Comments PopupComments Popup

Comments Popup

Collaborate with team members by viewing and contributing to comments on an incident.

  1. Within an incident, click the Comments icon or, in the Incident Details pane at the top right, click the Comments icon.
  2. Add a comment or view previous comments from your colleagues.
    Search for a specific comment using the Search bar at the top of the incident feed or in BigPanda's Search tab.

Incident Tags

Incident tags add context and details to your BigPanda incidents, enabling better correlation and faster troubleshooting.

BigPanda takes the raw data from your systems and normalizes it into key-value pairs, called tags. Each tag has two parts: the tag name and the tag value. Tags are the fundamental data model for your alerts and incidents and provide vital incident enrichment.

In BigPanda, tags enable alert correlation, provide incident information in the UI, and help you configure environments, perform searches, and collect analytics.

Incident tags are key-value pairs that allow you to quickly see summary information for a particular incident rather than needing to review all of the related alerts.

Relevant Permissions

Roles with the following permissions can access Incident Tags:

Role NameDescription
Environments_Full_Access
or
SomeEnv_Actions
Full access - View, add, edit or remove the incident tag from incidents.
Environments_Read
or
SomeEnv_Read
Read-only - View incident tags on incidents in some or all environments.
Incident-Tags-Definitions_ReadRead-only - View the Incident Tags section of the BigPanda Settings.
Incident-Tags-Definitions_Full_AccessFull access - View, create and edit incident tags in the Incident Tags section of the BigPanda Settings.

To learn more about how BigPanda's permissions work, see the RBAC - Role Based Access Control documentation.

Use Incident Tags

Incident tags will appear on both the Incident Feed and in the Overview tab of the Incident Details pane.

Incident tags add key enrichment to your BigPanda incidents, helping you see key information about the event.

Information about the user who edited the tag, and the time and date of the change can be accessed by hovering over the name of the incident tag. Tags that have not been manually edited will show the last date and time that automatic incident enrichment occurred.

Users are able to manually assign or remove incident tags. To learn more about using incident tags with incidents, please see the Add Tags to Incidents and Prioritize Incidents documentation.

You are able to create, edit, or inactivate incident tags to fit the needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.

Incident tags may also be configured to automatically add to specific incidents based on incident or alert criteria. To learn more about configuring automatic tags, please see the Automatic Incident Tags documentation.

🚧

If an incident has been manually split, the new incident will be created without any incident tag values. If incidents are manually merged, only the incident tags from the destination incident will appear on the merged incident. Source incident tags will not be added to the destination incident.

Incident Tag Types

Incident tags may take the form of Priority, Text, or Multi-value tags.

Priority Tags

26862686

Priority Tag

By default, your environment will have Priority tags enabled, with pre-configured settings. These settings can be customized to better fit the needs of your organization. To learn more about customizing tags, please see the Manage Incident Enrichment documentation.

Priority tags are visible at the top left of incidents in both the feed and the details Overview tab, next to the incident severity. Incidents that have not been prioritized will not show the priority icon.

Priority can be assigned from the incident feed or from the Overview tab of the incident details pane. To learn more about using priority tags, please see the Prioritize Incidents documentation

Text and Multi-value Tags

Text and Multi-value tags add data sets with additional information, details, or other enrichment to your incidents. Each tag is made up of a customized value pair similar to BigPanda alert tags.

Text and Multi-value tags appear at the top of the Overview tab of the incident details pane.

24882488

Text and Multi-value Tags

Each tag is made up of the name of the tag, and the tag value (e.g. Source_system: Nagios). For text tag types, the value is a single text string that appears in an editable text box. For multi-value tag types, the value is one or more individual text tags. These appear as individual items in the editable value field.

Configure text and multi-value tags such as “affected environment” or “region” to add context and enable better collaboration between your organization's teams.

Once configured, text and multi-value tags can be assigned to incidents from the Overview tab of the incident details pane. To learn more about using text and multi-value tags, please see the Add Tags to Incidents documentation.

Add Tags to Incidents

You can add previously created Incident tags to your incidents. To learn about creating Incident tags, see the Manage Incident Enrichment documentation

To add tags to incidents:

  1. In the Incidents tab, select an incident you wish to review.
  2. In the Incidents Details pane navigate to the Overview tab to view incident tags.
Viewing Incident TagsViewing Incident Tags

Viewing Incident Tags

  1. Select a tag you wish to edit and click the pencil icon. The Edit Incident Tag pop-up opens.

📘

If a tag cannot be manually edited, a lock icon will appear to the right of the incident tag.

  1. Enter the appropriate values in the editable field.
  1. Click Update to save.

🚧

Manually changing tag data will stop automated enrichment for this tag.

Tags may be a text or multi-value type.

Text tags allow you to include a single text string up to 256 characters. This free text tag allows you to add customized information such as a ticket number or note. When you are happy with your text tag, click Save.

Multi-value tags allow you to add several individual text tags. Multi-value tags allow you to list key information such as the organization’s services or regions affected by the incident. When you are happy with each individual tag, click Create or hit Enter or Tab on your keyboard.

📘

All tag changes will appear in the Incident Details pane within the Activity tab.

Manage Incident Tags

Incident tags can be configured to fit the specific needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.

Correlate Changes with Incidents

Identifying the root cause of an outage or a poorly performing application is one of the biggest challenges that IT organizations face today, and the fast-changing nature of modern dev only makes that more difficult.

BigPanda’s Root Cause Changes (RCC) feature simplifies this process by collecting change data right into the incidents dashboard, and then leveraging BigPanda’s algorithms to identify changes that may have led to incidents.

Use the Changes section and suggested matches to search and mark suspect changes and collaborate with other users to investigate which change caused the incident.

Marking changes as the suspected or matched root cause change of an incident is a vital tool in identifying historically problematic changes.

See Incidents in BigPanda for more information about the Changes tab.

14581458

Changes Table

📘

Root Cause Changes is an optional feature and may not be turned on for all organizations. If you would like to begin using the root cause changes feature, contact us at [email protected]

Search For Specific Changes

You can search the change table for changes that meet specific criteria and fall within a selected time frame. Use the time frame selection tool or the search bar at the top of the change table to find specific types of changes.

Select a Time Frame

By default, the Change Table displays changes that were active in the one hour before the incident started. You are able to select a different time frame from a set of options, or select a custom date range.

  • To change the change table’s time frame, click the current time frame. From the dropdown, select the desired time window
  • To filter by specific dates, select Custom Dates Range and enter the relevant dates and times in the dialogue box
14741474

Time Frame Dropdown

The table displays changes that are active during a specified time frame. Changes are considered active if they:

  • Start within the specified time frame
  • End within the specified time frame
  • Start before and remain active after the specified time frame
650650

Change Time Frame Rules

🚧

When calculating time ranges, the root cause change algorithm rounds start times up and end times down to the nearest hour. When searching changes based on expected matches, you may see different results than the algorithm.

Search Changes

Changes can be searched using BigPanda Query Language (BPQL) to find specific tag values. Use free text, tag values, boolean, or regex queries to narrow the list of changes to only those that meet that requirement.

14681468

Searching the Changes Table

When searching for incidents using BPQL, the Query Assist feature is available to help you build a query. See Query Assist for more information.

View RCC Status Details

The Root Cause Change (RCC) status column lists whether a change has been identified as a suspected or matched cause of the incident.

Click on the RCC status for a change to see a popup that contains info about the user that set the status, the most recent activity, and its date/time. Any comment the user added when setting the status will be included in the popup.

If the RCC status was set by BigPanda’s algorithms, the blue BigPanda icon will appear in the user field. Hover over the information icon to get more details about why BigPanda suspects the change is the potential root cause of the incident.

14381438

Details on Suggested Change

Mark Changes

Mark changes as suspected or matched root causes to record the connection between the change and incident for your team, analytics, and BigPanda’s algorithm.

👍

Marking changes as Suspect or Match is a vital tool in training BigPanda to recognize patterns between incidents and changes in your system

All changes can be marked with 1 of 3 statuses related to an incident. Select the status that best describes the change’s relationship to the cause of the incident:

  • None - The change is likely not the cause of the incident. This is the default RCC status.
  • Suspect - The change may have been related to the cause of the incident. If BigPanda’s RCC algorithms believe there is a strong connection between a change and incident, it will automatically mark the change as Suspect.
  • Match - The change is likely the cause or related to the cause of the incident. BigPanda will never automatically mark matches - changes can only be marked as a match by a human teammate.

To set a change RCC status:

  1. In the Change table, click the status dropdown for the change.
  2. Select the desired status.
  3. Click the status to enter a comment to add details or reasoning to the RCC status.
14681468

Marking a Change as Match

👍

Comments can be a great foundation for collaboration and post-outage review.

When a different status is selected for a change, a record of the activity is created with information about who set the change status, when, and any comment associated with it.

In addition, activities related to RCC (ie: type and time of correlation, comments, latest interaction, etc.) are listed chronologically in the activity feed.

19201920

Marked as Suspect Note

Suggested Changes

BigPanda's algorithms automatically detect connections between changes made to the system and incidents.

As new incidents are created or new alerts join an existing incident, BigPanda calculates their match potential with each past change.

If a change with high match potential is found, BigPanda marks the change as Suspect and adds a comment to the info popup explaining why the change was marked. Suspected changes will appear on the Overview tab as well as at the top of the Changes table. Filter the table to show only suspected and matched changes by clicking the Show potential RCC only toggle.

12981298

Suggested Suspect Change Reasoning

Suggested changes can rapidly speed up the root cause investigation process by identifying potential problems right at incident detection.

🚧

BigPanda will only mark changes as Suspect (not Match) to give users the final say on whether the change is the root cause of the incident

Search for Incidents

In BigPanda, you can use the Unified Search tab to investigate current and historical incidents across all of your integrated monitoring systems, which can help you find, solve, and prevent problems. Enter a keyword search or query, and apply filter criteria to narrow results. BigPanda finds all incidents with alerts that match your search criteria.

Use Unified Search

To search for incidents:

  1. At the top of the screen, click the Search tab.
  2. Enter a keyword search (term or exact phrase in quotes) or a query in BigPanda Query Language (BPQL)
    Both keyword search and BPQL support regular expressions.
  3. Select the filter criteria or leave the default settings.
  4. Click the search icon or press Enter.
    The results page shows the total number of matching incidents and lists up to the first ten matches.
  5. (Optional) Scroll down to view more results.

👍

Tag Names In Different Monitoring Systems

If you don't see the results you were expecting, try adding OR conditions in your BPQL query to include similar values that may have different tag names in different monitoring systems (ie: host in Nagios and object in SolarWinds).

Filter Search Results

You can apply any of these filter criteria to narrow the results of your searches:

  • Select the Environment.
  • Select the source.
    You can include all results from a source type (such as Nagios or New Relic). Or, you can include results only from a specific instance of the source type (for example, Nagios-US-EAST1).
  • Select a timeframe, or select Pick Date Range to enter specific dates and times.
    The results will include all incidents that were active at any point during the specified time frame (that is, started before the end time and ended before the start time).
Search Results FilterSearch Results Filter

Search Results Filter

📘

Default Filter Criteria

By default, search results display incidents in All Environments, from All Sources, that were active during the Last 7 Days. If you selected custom criteria, you can click Reset Filters to return to the default filter criteria.

Sort Search Results

You can change the sort order so that the results you want to see most are listed first. By default, incidents are listed in order by when they were last changed, with the most recently changed incident on top.

  1. On the top right of the results, click the Sort menu.
  2. Select the desired sort option:
  • Last Changed - Time of the last change to the incident
  • Status - Current status of the incident (Critical, Warning, Resolved or Acknowledged)
  • Created - The time the first alert in the incident was received
  • No. of Alerts - Number of active alerts in the incident(s)
Sort By OptionsSort By Options

Sort By Options

Review Search Results

The search results show basic information about incidents with matching alerts, including:

  • Incident title and subtitle.
  • Number of active alerts.
  • Source system.
  • Current status.
  • List of the alerts that the incident contains, along with a timeline of status changes for each alert.
  • Number of shares per incident.
  • Number of comments per incident.

👍

When searching for specific comments, the search results show all the information for each associated incident, not just the relevant comment. Click on the incident’s Comments icon to view the comments containing your search term(s).

Use the Timeline

The timeline shows the time frame for the filter criteria, highlighted in blue. It also shows the time when the first alert was received (incident start time) and the time when the incident was resolved (incident end time) or the current time if the incident is still active.

  • To see the complete details for an alert at any point in its life cycle, click a dot on the timeline. Then, click the arrows to step through the details of every status change for the alert.
  • To collapse the list of alerts and the timeline, click the arrow beside the row.

Link to Unified Searches

When you use Unified Search, the search parameters are appended to the URL of the results page. You can use this feature to share a link to your search with your teammates or to save a commonly run search as a bookmark.

To link to a Unified Search:

  1. At the top of the screen, click the Search tab.
  2. Run a search and apply filter criteria, as desired.
  3. To share the link, copy the URL in the browser address bar and send it to the desired recipients.
  4. To save the link, add a bookmark in your browser to the search results page.

❗️

Viewing Search Results

You must be logged in to BigPanda to view search results.

Search URL Parameters

BigPanda search URLs use standard formatting for each query parameter.

Syntax

The overall query syntax is built using parameter value pairs:

https://a.bigpanda.io/#/app/investigator?<parameter 1>=<value 1>&<parameter 2>=<value 2>

For example, a search for all Nagios events containing the value phx*db that were active at some time within the last hour, would have this URL:

https://a.bigpanda.io/#/app/investigator?query=phx*db&source=nagios.*&timeframe=-1h

📘

Syntax Rules

Parameters

ParameterDescriptionExample
queryKeyword or query in BPQLquery=host%3Dphx_db%20AND%20check!%3D_cpu*
environmentBigPanda Environment filterenvironment=production
sourceSource type or integration filter, in the following format:
`.`: all integrations of the same source type

- <source type>.<integration name>: a specific integration
source=nagios.*
source=nagios.nagios_us_east1
timeframeTime frame filters:
-1h: last hour
-2h: last 2 hours
-6h: last 6 hours
-24h: last 24 hours
-7d%2Fd: last 7 days
from, to: custom time frame with specific start and end times, in Unix Epoch timestamp (milliseconds) format.
timeframe=-7d%2Fd
from=1456610400000&to=1458684000000

📘

Environment and Source Values

The values for the environment and source parameters are unique, internal names that may be different from the descriptive names shown in the UI. If the URL parameters are not returning the expected results, try adjusting the filters on the Search tab and note the parameter values in the URL.

Search Incident Assignments

You can use Incident Assignments in Unified Search to search through current and historical incidents. Search for either the assignee or for the user who last changed the assignment (the assigner).

On the Search tab, enter a keyword searchfor the user's email address or type a more complex query using BigPanda Query Language (BPQL).

  • For incidents assigned to a specific person, use the format: assignee =
  • For unassigned incidents, use: *assignee != **
  • For incidents where a specific person changed the assignee, use the format: assigner =

For more information, see The Unified Search Tab documentation.

Examples of Keyword Search Queries

  • Use an asterisk to match multiple values with a common element, ie:
    phx*db
  • Add Quotes (" or ') around an exact phrase that contains spaces (spaces are allowed only between quotes), ie:
    "CPU over 90*"
  • Add a slash (/) as the first and last character to search for values that match a regular expression (case sensitive and limited to 32,000 characters), ie:
    /...Phx[0-9]{3}/

BPQL queries are in <tag> <operator> <value> format, using AND, OR and parenthesis to separate and/or prioritize multiple queries. For example:
host=srv-1 AND (check=chk-1 OR (check=chk-2 AND status=critical))

Search Specific Tags

BigPanda normalizes alert data into attributes called tags. Use BPQL to search for values in any standard or custom tag and to create advanced queries. As you type, the search bar displays suggested tags and monitoring system names that are relevant to your search.

Resolve Incidents

Most BigPanda incidents will resolve automatically when all alerts within the incident are marked OK by the monitoring system. If an alert never receives an OK status from the monitoring system, the incident will remain open within BigPanda.

If an incident is tied to a resolved issue, but has not been resolved in BigPanda, you can manually resolve incidents within BigPanda. Resolving incidents keeps your BigPanda dashboard clean and keeps your team focused on active issues.

Incidents can be resolved from the Incident Details pane on the Incidents tab.

Resolving IncidentsResolving Incidents

Resolving Incidents

To resolve an incident:

  1. Select the incident.
  2. Click the Resolve incident checkmark icon in the top right of the incident feed or incident details pane.
  3. (Optional) Add a note to let your team know why you are resolving this incident.
  4. Click Resolve.

The incident will be resolved in BigPanda, updating any share recipients of the new status, and adding a Resolved Manually note to the activity log.

If any of the alerts within the incident are reopened, the incident will also reopen as normal. Learn more about what triggers incidents and alerts to reopen in the Incident Lifecycle documentation.

Resolve Alerts

Most BigPanda alerts are resolved automatically through an OK update from the monitoring system that created the alert.

If an alert isn’t resolved automatically, you can manually resolve alerts. Resolving alerts enables you to remove outliers, clean up outstanding events, and ensure that BigPanda incidents best match the real state of ongoing issues.

Alerts can be resolved individually or in bulk from the incident details pane on the Incidents tab.

Resolving AlertsResolving Alerts

Resolving Alerts

To resolve an alert:

  1. Select the incident that has the alerts to be resolved.
  2. In the incident details pane, locate the alert(s) to be resolved in the Active Alerts section of the Overview tab, or on the Alerts tab.
  3. Use the selection boxes to select the alert(s) to be resolved.
  4. Click the Resolve Alerts icon to the top right of the alerts table.
  5. (Optional) Add a note to let your team know why you are resolving this alert.
  6. Click Resolve.

The alert(s) will be resolved in BigPanda, and the activity log will show the alert(s) as Resolved Manually. If the alert resolution changes the status of the incident, shared recipients will be updated of the new status. If the alert was the only open alert for the incident, the incident will resolve as normal.

If the monitoring system sends an update that would reopen the alert, the alert and any related incidents will reopen as normal. Learn more about what triggers incidents and alerts to reopen in the Incident Lifecycle documentation.

👍

Alerts can also be resolved through the Batch Alert Resolution API

Next Steps

Start Triaging Incidents in BigPanda

Learn more about Navigating the Incidents Tab

Dig into The Incident Life Cycle