Incidents in BigPanda

Use the Incidents tab to manage active incidents from a centralized place.

👍

Welcome to the New Docs Site Structure!

BigPanda docs moved to this new organization on September 30th, 2022.

If you're not finding what you're looking for, let us know what's missing in this short survey.

As raw data is ingested into BigPanda from integrated tools, the system correlates related events into high-level incidents. Incidents in BigPanda provide context to issues, and allow you to quickly identify, triage, and remediate problems before they become severe.

The Incidents Tab in BigPanda allows you to see all active incidents within your system, organized by statuses and environments. You can use the Incidents Feed to view, filter, and take action on incidents. Further details about specific incidents can be found within the Incident Details pane.

For a high-level look at navigation in the Incidents Tab, see the Incidents Tab navigation documentation.

Key Features

  • Triage Incidents - During triage steps, you can rapidly prioritize, assign, and share incidents with other users across platforms. During triage steps, you can also determine if an incident should be merged or split based on the criteria of the alerts within the incident.
  • Remediate Incidents - Use comments to communicate with other team members, add tags to supplement the incident with additional information, and resolve the incident once the issue has been solved.

Incident Feed

The incident feed provides a consolidated view of all active incidents from your integrated monitoring systems. You can use the incident feed to manage your incidents.

View the Incident Feed

BigPanda digests all the events from your integrated monitoring systems and intelligently correlates related events into incidents. Events are correlated and updated in real-time, so your incident feed is always up to date with the latest system and application statuses.

  1. At the top of the screen, click the Incidents tab.
    By default, the incident feed displays all active incidents.
  2. (Optional) In the left pane, select an Environment.
  3. (Optional) In the left pane, select a folder.
  4. Review basic information about each incident.
FieldDescription
Status IndicatorDisplays a colored ribbon on the left to indicate the incident status, which is determined by the most severe status of the related alerts.
Number of Active AlertsCounts the number of related alerts that are in the Critical or Warning state.
PriorityAssigned level of importance (most important on top). Incidents that do not have a priority assigned will be listed at the bottom by Last Changed.
Primary propertyShows why the events are correlated into an incident. By default, the primary property is defined as one of the following: host, service, application, or device.
Secondary propertySummarizes the subjects (such as hosts or applications) that are part of the incident. By default, the secondary property is defined as one of the following: check or sensor.
SystemDisplays the type of monitoring tool (such as Nagios or Zabbix) and the integration name (such as Production) that the events came from.
Last change, Created, or DurationShows information relevant to the current sort order. You can point to it to see more specific information. See Sorting Incidents.
  1. (Optional) Hover over a row to perform any of the available actions, such as Resolve, Snooze, Share, or Comment. See Triage Incidents and Remediate Incidents for more information.
    The number of existing shares and comments are displayed for each incident. You can click either number to view relevant details.

📘

Viewing Incident Details

To view more information, click the incident in the feed. The incident details appear in the right pane. You can view related alert details, a timeline of the incident life cycle, sharing history, comments, and more. For more information, see Incident Details.

Incident Tags

Incident tags will appear on both the Incident Feed and in the Overview tab of the Incident Details pane.

Incident tags add key enrichment to your BigPanda incidents, helping you see key information about the event.

Information about the user who edited the tag, and the time and date of the change can be accessed by hovering over the name of the incident tag. Tags that have not been manually edited will show the last date and time that automatic incident enrichment occurred.

Users are able to manually assign or remove incident tags. To learn more about using incident tags with incidents, please see the Adding Incident Tags and Prioritizing Incidents documentation.

You are able to create, edit, or inactivate incident tags to fit the needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.

Incident tags may also be configured to automatically add to specific incidents based on incident or alert criteria. To learn more about configuring automatic tags, please see the Automatic Incident Tags documentation.

🚧

If an incident has been manually split, the new incident will be created without any incident tag values. If incidents are manually merged, only the incident tags from the destination incident will appear on the merged incident. Source incident tags will not be added to the destination incident.

When an incident is resolved, the incident tags will remain tied to the incident for 18 months. If the incident is reopened, it will have all of the existing incident tags, with new ones added as the reopened incident develops.

Incident Tag Types

Incident tags may take the form of Priority, Text, or Multi-value tags.

Priority Tags

Priority tags create a sortable hierarchy to mark in which order incidents should be addressed. Priority tags make it easier to view the importance and urgency of your incidents at a glance.

26862686

Priority Tag

By default, your environment will have Priority tags enabled, with pre-configured settings. These settings can be customized to better fit the needs of your organization. To learn more about customizing tags, please see the Manage Incident Enrichment documentation.

Priority tags are visible at the top left of incidents in both the feed and the details Overview tab, next to the incident severity. Incidents that have not been prioritized will not show the priority icon.

Priority can be assigned from the incident feed or from the Overview tab of the incident details pane. To learn more about using priority tags, please see the Prioritizing Incidents documentation

Text and Multi-value Tags

Text and Multi-value tags add data sets with additional information, details, or other enrichment to your incidents. Each tag is made up of a customized value pair similar to BigPanda alert tags.

Text and Multi-value tags appear at the top of the Overview tab of the incident details pane.

24882488

Text and Multi-value Tags

Each tag is made up of the name of the tag, and the tag value (e.g. Source_system: Nagios). For text tag types, the value is a single text string that appears in an editable text box. For multi-value tag types, the value is one or more individual text tags. These appear as individual items in the editable value field.

Configure text and multi-value tags such as “affected environment” or “region” to add context and enable better collaboration between your organization's teams.

Once configured, text and multi-value tags can be assigned to incidents from the Overview tab of the incident details pane. To learn more about using text and multi-value tags, please see the Add Incident Tags documentation.

Select a Folder

A folder filters the incident feed by predefined criteria. You can select a folder to see all the incidents within an Environment that meet the folder criteria.

  1. From the Incident tab, select an Environment in the left pane.
    The incident feed shows the active incidents in the Environment, and the list of available folders expands.
  2. In the left pane, select the folder with the desired criteria.

Folder

Criteria

Active

Incident has active alerts and is not snoozed.

Unhandled

Incident has active alerts and has not been shared or snoozed.

Shared

Incident is active, and has been shared with users manually or by AutoShare.

Snoozed

Incident is active, was snoozed, and is within the snooze period. When the snooze period elapses, the incident again appears in the Active folder and no longer appears in the Snoozed folder.

Maintenance

Incident includes one or more alert that has been muted because of a Maintenance plan.

Maintenance plans will only appear in your UI if they have been configured by an admin. See the Maintenance plans documentation for more information.

Resolved (24h)

Incident was marked as resolved within the past 24 hours. When an incident is reopened, it again appears in the Active folder and no longer appears in the Resolved folder.

Search for Incidents

You can search for incidents that meet specific criteria within the selected Environment and folder.

  1. (Optional) Select an Environment and a folder.
  2. At the top of the incident feed, enter a keyword search term or exact phrase in quotes keyword search (term or exact phrase in quotes) or a query in BigPanda Query Language (BPQL)

📘

Regular Expression Support

Both keyword search and BPQL support regular expressions. Use a regular expression by entering a slash (/) as the first and last character of your search term. For example, /prod-.*-[0-9]+/. Regex queries are limited to 32,000 characters and are case sensitive. See Elasticsearch Regular Expression Syntax and BPQL for more regex support.

  1. Click the search icon or press Enter.

📘

Search Logic and Results

Enter a term or an exact phrase in quotes to perform a keyword search of the incidents in the selected Environment and folder. The search finds alerts with matching values in descriptions, source systems, and in any standard or custom tag (such as host, check, or status).

Use BPQL to search for values in a specific alert tag or to create an advanced query. You can search any standard or custom tags, define precise conditions with operators, and include multiple conditions.

  1. (Optional) Scroll down to view more results.

Filter by Assignee and Sort Incidents

The feed lists incidents that meet the current environment folder and search criteria. By default, the incidents are listed in order by when they were last changed, with the most recently changed incident on top. You can filter by assignee or change the sort order of the incidents in your feed.

Filter

The filter by Assignee option can be used to filter the incident feed by the incident assignee. Filter by your own name to get a clear picture of incidents you are responsible for, or by another team member's name to see their workload.

To filter the feed:

  1. From the incident feed, click the Filter by Assignee icon to the right of the search field.
  2. Select an assignee from the list.
  3. To remove the filter, click the Filter by Assignee icon again and select Clear filter.

Sort

To change the sort order of the incident feed:

  1. From the incident feed, click the Sort icon, second to the right of the search field.
  2. Select the desired sort order.
ItemDescription
Last ChangedTime that the incident was last changed (most recently changed on top). A change includes status changes on related alerts and the addition of new alerts to the incident.
PriorityAssigned level of importance (most important on top). Incidents that do not have a priority assigned will be listed at the bottom by Last Changed.
StatusCurrent status of the incident (most severe status on top, in the order: critical > warning > unknown > acknowledged > resolved). Secondary sorting is based on Last Changed.
CreatedTime the first alert on the incident was received (newest on top). The order is preserved even if the status of an incident changes.
No. of AlertsNumber of active alerts (highest number on top). Secondary sorting is based on Last Changed. In the Resolved folder only, the number of alerts is the total number of alerts, as no alerts are active on a resolved incident.
DurationAmount of time that the incident has been open (longest on top). Secondary sorting is based on Last Changed.

Respond to Incidents

You can respond to incidents within the Incident feed using the incident action icons.

Incident ActionsIncident Actions

Incident Actions

Take action using the prioritize, assign, resolve, snooze, comment, or share icons on each incident, or use the selection boxes to take action on multiple incidents at once. Click any incident in the incident feed to open the incident details in the incident pane.

To learn more about taking action on incidents, see the Triage Incidents and Remediate Incidents documentation.

Incident Details

The Incident Details pane provides in-depth information about an incident. The Incident Details pane contains tabs that allow you to view incident information, related alerts, incident history, and take action on incidents.

To access the Incident Details pane, click an incident from within the Incident Feed. The Incident Details pane opens on the right side of the screen.

👍

Single Pane Incident View

In the top right of the incident details pane, click the expand icon to change to single pane view.

📘

Only the 1000 most recent incident activities appear in the Incident Details pane in the BigPanda UI. If an incident has more than 1000 activities, all of them can be retrieved using the Get Activities API.

For information about fields within the Incident Details screen, see The Incidents Tab documentation.

Incident Life Cycle Logic

The life cycle of an incident is defined by the life cycle of the alerts it contains. An incident remains active if at least one of the alerts is active, is automatically resolved when all the alerts are resolved, and is reopened when a resolved alert becomes active again.

Alert/Incident Statuses

14691469

Alert Statuses

Possible status levels are critical, warning, unknown, ok, or acknowledged. BigPanda uses colors to indicate the current status in the incident feed, the timeline, and other UI elements.

StatusColorDescription
criticalRedThe monitoring system has detected a serious problem. For example, a service is unavailable or a maximum usage threshold has been exceeded. This is the most severe status in BigPanda.
warningOrangeThe monitoring system has detected a potential problem. For example, the disk space is low.
okGreenThe alert is resolved and/or no problems are detected.
acknowledgedGreyThe alert has been acknowledged in the source system or the monitored object is under scheduled maintenance. Some systems offer these options and can send this information to BigPanda.
unknownYellowSomething is broken with the monitoring system itself (instead of the monitored object or metric). Some systems, such as Nagios, can generate alerts with this status.

Alert Resolution and Closing Statuses

An incident remains open as long as at least one of the alerts associated with it is open. When BigPanda receives an event with a status of ok, the related alert is automatically resolved.

Alerts that have not been resolved remain open in BigPanda. The corresponding incident also remains open and continues to appear in the Incident Feed.

📘

Resolving Alerts with the Alerts REST API

To maintain only the most relevant information in the incident feed, send a resolving event to BigPanda using the Alerts REST API when an alert is no longer active.

Reopen Incidents

Resolved incidents are reopened when any of the alerts associated with them reopen. This rule applies regardless of how the incident was resolved—manually, due to inactivity, or automatically when all associated alerts were resolved. Alerts are reopened if they reoccur within 60 minutes of when they were resolved. If an alert reoccurs more than 60 minutes later, it is handled as a new alert.

The incident will also reopen if a new event that matches one of the correlation patterns of the incident comes in. Incidents that are more than 30 days old are never reopened. If the associated alerts reoccur, a new incident is created.

👍

The time frame of the reopen window can be customized to fit your monitoring needs if necessary. Keep in mind this is a global setting that impacts all incidents. Please contact us at [email protected] if you'd like to change the time frame for incident reopening.

Flapping Incidents

Flapping occurs when a monitored object (ie: a service or host) changes state too frequently, making the cause and severity of the incident unclear. For example, flapping can be indicative of configuration problems (ie: thresholds set too low), troublesome services, or real network problems.

When an alert changes states frequently, it may generate numerous events that are not immediately actionable. In the example timeline shown below, you can see how hundreds of potential notifications are grouped into one incident for the application that is flapping.

In BigPanda, an incident enters the flapping state when one or more of the related alerts are flapping. By default, an alert is considered to be flapping when it has changed states more than 4 times in one hour. Contact BigPanda support if you need to configure custom logic (number of state changes within a period of time) for your organization or for a specific integration.

When an incident enters the flapping state, all subscribed integrations are notified and no additional state change notifications are sent. Email integrations will send a daily email reminding users about the incident. An incident exits the flapping state when all related alerts stop flapping (no longer meet the criteria for number of state changes in a period of time). BigPanda checks the flapping criteria every 15 minutes.

Mobile Support

In the lightning-fast world of ITOps, it’s vital to be able to respond to outages no matter where you are. The BigPanda Incident Feed is mobile-compatible, allowing you to find and view incidents, dig into their details, and take action even on the go.

Mobile Incident FeedMobile Incident Feed

Mobile Incident Feed

BigPanda mobile works on any device capable of running a Supported browser.

Use BigPanda Mobile

To optimize the interface for mobile screens, the BigPanda Mobile Incident Feed is streamlined and simplified.

By default, the BigPanda mobile screen will open your view on the All Incidents/Active environment folder. To change the environment or folder, select the three lines icon in the top left of the page, and select the environment or folder from the flyout list. Filter environments at the top of the flyout by entering a term or an exact phrase in quotes.

To maximize performance, you are able to toggle the feed between Live and Manual Updates. Live Updates update the incident feed with new incidents, comments, and changed incident statuses automatically. Manual Updates will only update the incident feed when you refresh your browser page, or when reopening the page after closing. To change to a different feed setting, select the Settings wheel and click the desired frequency.

Select an incident to open the Incident Details page. From the Incident Details page, you are able to take action on the incident, or delve into alert details, timeline, and potentially related changes. To learn more about the incident details pane, see the Incidents Tab documentation.

To return to the incident feed, click the back button on your mobile browser.

Mobile Incident DetailsMobile Incident Details

Mobile Incident Details

📘

Incident Preview

When opening an incident preview on a mobile device, it will open automatically in the BigPanda mobile incident details view.

See Incident Previews for more information.

Incident search is available in the Mobile Incident Feed using both keyword and formula queries. To search the incident feed, click the magnifying glass icon at the top right, and enter your query in the field that appears.

Incident information is condensed within the mobile view to maximize the visibility of key information such as priority, assignment, severity, and action status. To view a full incident title, description, or tag, click the shortened text and a tooltip will appear with the full text.

📘

Incident Actions

You can take action on incidents using the Mobile Incident Feed. Click the icon for the action you would like to take and the action dialog box will open.

To learn more about incident actions, see the Triage Incidents and Remediate Incidents documentation.

Incident Feed Wallboard

The Incident Feed Wallboard allows you to view the active Incident Feed from a higher level view and opens it into a full-screen. When open, incidents are displayed in a widened single-row format, allowing a greater number of incidents to be displayed on one screen.

The Incident Feed Wallboard is commonly displayed on a dedicated monitor in the NOC. Displaying the Incident Feed on a centralized screen in a denser format offers better collaboration with team members. This is especially helpful for organizations with high activity levels and is a better way to keep up with new incidents streaming in.

All details and standard workflow actions such as Comment, Snooze, Assign, and Share are available in the conventional Incident Feed and are retained. Additionally, a status tag is included at the end of each record to help with easier focus and assessment of information.

Every Incident Feed in the console allows for the expansion into the full-screen Wallboard view using the Open Wallboard button.

14141414

Toggling the full-screen Wallboard.

📘

Session management is disabled while in the Wallboard view and the console will not automatically log out.

Next Steps

Start Triaging Incidents in BigPanda

Learn more about Navigating the Incidents Tab

Dig into the Incident Intelligence Enrichment Process