Incidents in BigPanda
Use the Incidents tab to manage active incidents from a centralized place.
As raw data is ingested into BigPanda from integrated tools, the system correlates related events into high-level incidents. Incidents in BigPanda provide context to issues, and allow you to quickly identify, triage, and remediate problems before they become severe.
The Incidents Tab in BigPanda allows you to see all active incidents within your system, organized by statuses and environments. You can use the Incidents Feed to view, filter, and take action on incidents. Further details about specific incidents can be found within the Incident Details pane.
For a high-level look at navigation in the Incidents Tab, see the Incidents Tab navigation documentation.
- Triage Incidents - During triage steps, you can rapidly prioritize, assign, and share incidents with other users across platforms. During triage steps, you can also determine if an incident should be merged or split based on the criteria of the alerts within the incident.
- Remediate Incidents - Use comments to communicate with other team members, add tags to supplement the incident with additional information, and resolve the incident once the issue has been solved.
The incident feed provides a consolidated view of all active incidents from your integrated monitoring systems. You can use the incident feed to manage your incidents.
View the Incident Feed
BigPanda digests all the events from your integrated monitoring systems and intelligently correlates related events into incidents. Events are correlated and updated in real-time, so your incident feed is always up to date with the latest system and application statuses.
- At the top of the screen, click the Incidents tab.
By default, the incident feed displays all active incidents.
- (Optional) In the left pane, select an Environment.
- (Optional) In the left pane, select a folder.
- Review basic information about each incident.
|Status Indicator||Displays a colored ribbon on the left to indicate the incident status, which is determined by the most severe status of the related alerts.|
|Number of Active Alerts||Counts the number of related alerts that are in the Critical or Warning state.|
|Priority||Assigned level of importance (most important on top). Incidents that do not have a priority assigned will be listed at the bottom by Last Changed.|
|Primary property||Shows why the events are correlated into an incident. By default, the primary property is defined as one of the following: host, service, application, or device.|
|Secondary property||Summarizes the subjects (such as hosts or applications) that are part of the incident. By default, the secondary property is defined as one of the following: check or sensor.|
|System||Displays the type of monitoring tool (such as Nagios or Zabbix) and the integration name (such as Production) that the events came from.|
|Last change, Created, or Duration||Shows information relevant to the current sort order. You can point to it to see more specific information. See Sorting Incidents.|
- (Optional) Hover over a row to perform any of the available actions, such as Resolve, Snooze, Share, or Comment. See Triage Incidents and Remediate Incidents for more information.
The number of existing shares and comments are displayed for each incident. You can click either number to view relevant details.
Viewing Incident Details
To view more information, click the incident in the feed. The incident details appear in the right pane. You can view related alert details, a timeline of the incident life cycle, sharing history, comments, and more. For more information, see Incident Details.
Incident tags will appear on both the Incident Feed and in the Overview tab of the Incident Details pane.
Incident tags add key enrichment to your BigPanda incidents, helping you see key information about the event.
Information about the user who edited the tag, and the time and date of the change can be accessed by hovering over the name of the incident tag. Tags that have not been manually edited will show the last date and time that automatic incident enrichment occurred.
Users are able to manually assign or remove incident tags. To learn more about using incident tags with incidents, please see the Adding Incident Tags and Prioritizing Incidents documentation.
You are able to create, edit, or inactivate incident tags to fit the needs of your organization. To learn more about configuring incident tags, please see the Manage Incident Enrichment documentation.
Incident tags may also be configured to automatically add to specific incidents based on incident or alert criteria. To learn more about configuring automatic tags, please see the Automatic Incident Tags documentation.
If an incident has been manually split, the new incident will be created without any incident tag values. If incidents are manually merged, only the incident tags from the destination incident will appear on the merged incident. Source incident tags will not be added to the destination incident.
When an incident is resolved, the incident tags will remain tied to the incident for 18 months. If the incident is reopened, it will have all of the existing incident tags, with new ones added as the reopened incident develops.
Incident Tag Types
Incident tags may take the form of Priority, Text, or Multi-value tags.
Priority tags create a sortable hierarchy to mark in which order incidents should be addressed. Priority tags make it easier to view the importance and urgency of your incidents at a glance.
By default, your environment will have Priority tags enabled, with pre-configured settings. These settings can be customized to better fit the needs of your organization. To learn more about customizing tags, please see the Manage Incident Enrichment documentation.
Priority tags are visible at the top left of incidents in both the feed and the details Overview tab, next to the incident severity. Incidents that have not been prioritized will not show the priority icon.
Priority can be assigned from the incident feed or from the Overview tab of the incident details pane. To learn more about using priority tags, please see the Prioritizing Incidents documentation
Text and Multi-value Tags
Text and Multi-value tags add data sets with additional information, details, or other enrichment to your incidents. Each tag is made up of a customized value pair similar to BigPanda alert tags.
Text and Multi-value tags appear at the top of the Overview tab of the incident details pane.
Each tag is made up of the name of the tag, and the tag value (e.g. Source_system: Nagios). For text tag types, the value is a single text string that appears in an editable text box. For multi-value tag types, the value is one or more individual text tags. These appear as individual items in the editable value field.
Configure text and multi-value tags such as
affected environment or
region to add context and enable better collaboration between your organization's teams.
Once configured, text and multi-value tags can be assigned to incidents from the Overview tab of the incident details pane. To learn more about using text and multi-value tags, please see the Add Incident Tags documentation.
Select a Folder
A folder filters the incident feed by predefined criteria. You can select a folder to see all the incidents within an Environment that meet the folder criteria.
- From the Incident tab, select an Environment in the left pane.
The incident feed shows the active incidents in the Environment, and the list of available folders expands.
- In the left pane, select the folder with the desired criteria.
|Active||Incident has active alerts and is not snoozed.|
|Unhandled||Incident has active alerts and has not been shared or snoozed.|
|Shared||Incident is active, and has been shared with users manually or by AutoShare.|
|Snoozed||Incident is active, was snoozed, and is within the snooze period. When the snooze period elapses, the incident again appears in the Active folder and no longer appears in the Snoozed folder.|
|Maintenance||Incident includes one or more alert that has been muted because of a Maintenance plan.|
Maintenance plans will only appear in your UI if they have been configured by an admin. See the Maintenance plans documentation for more information.
|Resolved (24h)||Incident was marked as resolved within the past 24 hours. When an incident is reopened, it again appears in the Active folder and no longer appears in the Resolved folder.|
Search for Incidents
You can search for incidents that meet specific criteria within the selected Environment and folder.
- (Optional) Select an Environment and a folder.
- At the top of the incident feed, enter a keyword search term or exact phrase in quotes keyword search or a query in BigPanda Query Language
Regular Expression Support
Both keyword search and BPQL support regular expressions. Use a regular expression by entering a slash (/) as the first and last character of your search term. For example,
/prod-.*-[0-9]+/. Regex queries are limited to 32,000 characters and are case sensitive. See Elasticsearch Regular Expression Syntax and BPQL for more regex support.
- Click the search icon or press Enter.
Search Logic and Results
Enter a term or an exact phrase in quotes to perform a keyword search of the incidents in the selected Environment and folder. The search finds alerts with matching values in descriptions, source systems, and in any standard or custom tag (such as host, check, or status).
Use BPQL to search for values in a specific alert tag or to create an advanced query. You can search any standard or custom tags, define precise conditions with operators, and include multiple conditions.
- (Optional) Scroll down to view more results.
Filter by Assignee and Sort Incidents
The feed lists incidents that meet the current environment folder and search criteria. By default, the incidents are listed in order by when they were last changed, with the most recently changed incident on top. You can filter by assignee or change the sort order of the incidents in your feed.
The filter by Assignee option can be used to filter the incident feed by the incident assignee. Filter by your own name to get a clear picture of incidents you are responsible for, or by another team member's name to see their workload.
To filter the feed:
- From the incident feed, click the Filter by Assignee icon to the right of the search field.
- Select an assignee from the list.
- To remove the filter, click the Filter by Assignee icon again and select Clear filter.
To change the sort order of the incident feed:
- From the incident feed, click the Sort icon, second to the right of the search field.
- Select the desired sort order.
|Last Changed||Time that the incident was last changed (most recently changed on top). A change includes status changes on related alerts and the addition of new alerts to the incident.|
|Priority||Assigned level of importance (most important on top). Incidents that do not have a priority assigned will be listed at the bottom by Last Changed.|
|Status||Current status of the incident (most severe status on top, in the order: |
|Created||Time the first alert on the incident was received (newest on top). The order is preserved even if the status of an incident changes.|
|No. of Alerts||Number of active alerts (highest number on top). Secondary sorting is based on Last Changed. In the Resolved folder only, the number of alerts is the total number of alerts, as no alerts are active on a resolved incident.|
|Duration||Amount of time that the incident has been open (longest on top). Secondary sorting is based on Last Changed.|
Filter the Incident Feed Using Environments
Environments function as global filters for the incidents in the Incidents tab. By default, when navigating to the Incidents tab, the All Incidents environment will be selected.
On the left of the Incident tab is the Environments pane. All available environments are listed, with the current Environment highlighted and expanded.
To change which environment you are viewing, select the desired environment’s name from the list.
The Incident Feed will update to show only incidents that are grouped into that environment.
Each environment is pre-sorted into status folders: Active, Unhandled, Shared, Snoozed, and Resolved. Incidents that fit the environment rules will be automatically placed in their respective status folder(s). When selecting an environment, the Active folder will open first. To move to a different folder, select the folder name from the Environments pane.
Incidents will appear in all relevant folders. An incident that has been shared and snoozed will appear in both folders. Resolving an incident will move it to the Resolved folder and remove it from other folders.
At the top of the Environments pane, a search bar allows you to filter environments. For organizations that have numerous environments, use the filter feature to quickly isolate a particular environment.
To filter the Environments pane, begin typing the environment name into the Filter search bar. Matching results will appear in real-time.
Clearing the filter search bar will revert the list and show all environments.
Each user is able to Star environments, saving them to a Starred group at the top of the Environments pane. Click the Star beside an environment name, or the Star option in the three dots dropdown to save it for easy access.
Respond to Incidents
You can respond to incidents within the Incident feed using the incident action icons.
Take action using the prioritize, assign, resolve, snooze, comment, or share icons on each incident, or use the selection boxes to take action on multiple incidents at once. Click any incident in the incident feed to open the incident details in the incident pane.
To learn more about taking action on incidents, see the Triage Incidents and Remediate Incidents documentation.
The Incident Details pane provides in-depth information about an incident. The Incident Details pane contains tabs that allow you to view incident information, related alerts, incident history, and take action on incidents.
To access the Incident Details pane, click an incident from within the Incident Feed. The Incident Details pane opens on the right side of the screen.
Single Pane Incident View
In the top right of the incident details pane, click the expand icon to change to single pane view.
Incident Details Tabs
The Incident Details tabs help you drill down into the different elements of a selected incident.
Incident details are broken out by type of incident data:
The information on each tab depends on the data being sent to BigPanda. Not all incidents will have information on each tab.
For information about fields within the Incident Details screen, see The Incidents Tab documentation.
The Overview Tab provides a consolidated view of the contents of the other tabs within the incident details pane. Click a link within a section of the Overview Tab to be directed to the corresponding tab in the incident details.
See The Incidents Tab documentation for more information.
The Alerts tab displays information about alerts associated with an Incident. Within this tab, you can view changes and alert links, and split incidents.
Click any alert within the Alerts tab table to view additional details. Which details appear in the UI can be configured by your BigPanda administrator. See Manage Alert Views for more information.
See the The Alerts Tab documentation for more details.
The Topology tab within the Incident Details pane provides access to the Topology graph for the incident. The Topology graph is a customizable visual display of the links between the incident's alert tags, or Nodes.
See the Topology Tab documentation for more information.
Change data related to an incident is displayed in the Changes tab. BigPanda uses algorithms to correlate and suggest changes that may have caused an incident. If BigPanda has found a change to be highly correlated with an incident, it will appear at the top of the change table and in the Overview tab as a Potential Root Cause Change.
See the Changes Tab documentation for more information.
The Activity tab provides information about activities that occurred within an incident. Within this tab, you can view and add comments, see previous incident actions, and view status changes.
Only the 1000 most recent incident activities appear in the Incident Details pane in the BigPanda UI. If an incident has more than 1000 activities, all of them can be retrieved using the Get Activities API.
See the Activity Tab documentation for more information.
Incident Life Cycle Logic
The life cycle of an incident is defined by the life cycle of the alerts it contains. An incident remains active if at least one of the alerts is active, is automatically resolved when all the alerts are resolved, and is reopened when a resolved alert becomes active again.
Possible status levels are
acknowledged. BigPanda uses colors to indicate the current status in the incident feed, the timeline, and other UI elements.
|critical||Red||The monitoring system has detected a serious problem. For example, a service is unavailable or a maximum usage threshold has been exceeded. This is the most severe status in BigPanda.|
|warning||Orange||The monitoring system has detected a potential problem. For example, the disk space is low.|
|ok||Green||The alert is resolved and/or no problems are detected.|
|acknowledged||Grey||The alert has been acknowledged in the source system or the monitored object is under scheduled maintenance. Some systems offer these options and can send this information to BigPanda.|
|unknown||Yellow||Something is broken with the monitoring system itself (instead of the monitored object or metric). Some systems, such as Nagios, can generate alerts with this status.|
Alert Resolution and Closing Statuses
An incident remains open as long as at least one of the alerts associated with it is open. When BigPanda receives an event with a status of ok, the related alert is automatically resolved.
Alerts that have not been resolved remain open in BigPanda. The corresponding incident also remains open and continues to appear in the Incident Feed.
Resolving Alerts with the Alerts REST API
To maintain only the most relevant information in the incident feed, send a resolving event to BigPanda using the Alerts REST API when an alert is no longer active.
Resolved incidents are reopened when any of the alerts associated with them reopen. This rule applies regardless of how the incident was resolved—manually, due to inactivity, or automatically when all associated alerts were resolved. Alerts are reopened if they reoccur within 60 minutes of when they were resolved. If an alert reoccurs more than 60 minutes later, it is handled as a new alert.
The incident will also reopen if a new event that matches one of the correlation patterns of the incident comes in. Incidents that are more than 30 days old are never reopened. If the associated alerts reoccur, a new incident is created.
The time frame of the reopen window can be customized to fit your monitoring needs if necessary. Keep in mind this is a global setting that impacts all incidents. Please contact your BigPanda Account Manager if you'd like to change the time frame for incident reopening.
Flapping occurs when a monitored object (ie: a service or host) changes state too frequently, making the cause and severity of the incident unclear. For example, flapping can be indicative of configuration problems (ie: thresholds set too low), troublesome services, or real network problems.
When an alert changes states frequently, it may generate numerous events that are not immediately actionable. In the example timeline shown below, you can see how hundreds of potential notifications are grouped into one incident for the application that is flapping.
In BigPanda, an incident enters the flapping state when one or more of the related alerts are flapping. By default, an alert is considered to be flapping when it has changed states more than 4 times in one hour. Contact your BigPanda Account Manager if you need to configure custom logic (number of state changes within a period of time) for your organization or for a specific integration.
When an incident enters the flapping state, all subscribed integrations are notified and no additional state change notifications are sent. Email integrations will send a daily email reminding users about the incident. An incident exits the flapping state when all related alerts stop flapping (no longer meet the criteria for number of state changes in a period of time). BigPanda checks the flapping criteria every 15 minutes.
In the lightning-fast world of ITOps, it’s vital to be able to respond to outages no matter where you are. The BigPanda Incident Feed is mobile-compatible, allowing you to find and view incidents, dig into their details, and take action even on the go.
BigPanda mobile works on any device capable of running a Supported browser.
Use BigPanda Mobile
To optimize the interface for mobile screens, the BigPanda Mobile Incident Feed is streamlined and simplified.
By default, the BigPanda mobile screen will open your view on the All Incidents/Active environment folder. To change the environment or folder, select the three lines icon in the top left of the page, and select the environment or folder from the flyout list. Filter environments at the top of the flyout by entering a term or an exact phrase in quotes.
To maximize performance, you are able to toggle the feed between Live and Manual Updates. Live Updates update the incident feed with new incidents, comments, and changed incident statuses automatically. Manual Updates will only update the incident feed when you refresh your browser page, or when reopening the page after closing. To change to a different feed setting, select the Settings wheel and click the desired frequency.
Select an incident to open the Incident Details page. From the Incident Details page, you are able to take action on the incident, or delve into alert details, timeline, and potentially related changes. To learn more about the incident details pane, see the Incidents Tab documentation.
To return to the incident feed, click the back button on your mobile browser.
When opening an incident preview on a mobile device, it will open automatically in the BigPanda mobile incident details view.
See Incident Previews for more information.
Incident search is available in the Mobile Incident Feed using both keyword and formula queries. To search the incident feed, click the magnifying glass icon at the top right, and enter your query in the field that appears.
Incident information is condensed within the mobile view to maximize the visibility of key information such as priority, assignment, severity, and action status. To view a full incident title, description, or tag, click the shortened text and a tooltip will appear with the full text.
You can take action on incidents using the Mobile Incident Feed. Click the icon for the action you would like to take and the action dialog box will open.
To learn more about incident actions, see the Triage Incidents and Remediate Incidents documentation.
Incident Feed Wallboard
The Incident Feed Wallboard allows you to view the active Incident Feed from a higher level view and opens it into a full-screen. When open, incidents are displayed in a widened single-row format, allowing a greater number of incidents to be displayed on one screen.
The Incident Feed Wallboard is commonly displayed on a dedicated monitor in the NOC. Displaying the Incident Feed on a centralized screen in a denser format offers better collaboration with team members. This is especially helpful for organizations with high activity levels and is a better way to keep up with new incidents streaming in.
All details and standard workflow actions such as Comment, Snooze, Assign, and Share are available in the conventional Incident Feed and are retained. Additionally, a status tag is included at the end of each record to help with easier focus and assessment of information.
Every Incident Feed in the console allows for the expansion into the full-screen Wallboard view using the Open Wallboard button.
Session management is disabled while in the Wallboard view and the console will not automatically log out.
Start Triaging Incidents in BigPanda
Learn more about Navigating the Incidents Tab
Dig into the Incident Intelligence Enrichment Process
Updated 1 day ago