Incident Life Cycle Logic

The life cycle of an incident is defined by the life cycle of the alerts it contains. An incident remains active if at least one of the alerts is active, is automatically resolved when all the alerts are resolved and is reopened when a resolved alert becomes active again.

Alert Resolution and Closing Incidents

An incident remains open as long as at least one of the alerts associated with it is open. When BigPanda receives an event with a status of ok, the related alert is automatically resolved.

Alerts that have not been resolved remain open in BigPanda. The corresponding incident also remains open and continues to appear in the Incident Feed.

📘

Resolving Alerts with the Alerts REST API

To maintain only the most relevant information in the incident feed, send a resolving event to BigPanda using the Alerts REST API when an alert is no longer active.

Reopening Incidents

Resolved incidents are reopened when any of the alerts associated with them reopen. This rule applies regardless of how the incident was resolved—manually, due to inactivity, or automatically when all associated alerts were resolved. Alerts are reopened if they reoccur within 60 minutes of when they were resolved. If an alert reoccurs more than 60 minutes later, it is handled as a new alert.

The incident will also reopen if a new alert that matches one of the correlation patterns of the incident comes in. Incidents that are more than 30 days old are never reopened. If the associated alerts reoccur, a new incident is created.

Flapping Incidents

Flapping occurs when a monitored object (ie: a service or host) changes state too frequently, making the cause and severity of the incident unclear. For example, flapping can be indicative of configuration problems (ie: thresholds set too low), troublesome services or real network problems.
When an alert changes states frequently, it may generate numerous events that are not immediately actionable. In the example timeline shown below, you can see how hundreds of potential notifications are grouped into one incident for the application that is flapping.

In BigPanda, an incident enters the flapping state when one or more of the related alerts are flapping. By default, an alert is considered to be flapping when it has changed states more than 4 times in one hour. Contact BigPanda support if you need to configure custom logic (number of state changes within a period of time) for your organization or for a specific integration.

When an incident enters the flapping state, all subscribed users are notified and no additional state change notifications are sent. Subscribed users still receive a daily email reminding them about the incident. An incident exits the flapping state when all related alerts stop flapping (no longer meet the criteria for number of state changes in a period of time). BigPanda checks the flapping criteria every 15 minutes.

Learn more...

To learn more about how BigPanda uses pattern recognition to cluster alerts into meaningful, actionable incidents, see our Algorithmic Correlation user guide.

To learn more about defining and managing correlation patterns, see our Working with Correlation Patterns guide.

To learn more about how BigPanda merges events into alerts and clusters alerts into incidents, see our Alert Correlation Logic user guide.