Incident Management

An incident represents a high-level issue in your infrastructure. In BigPanda, incidents are created automatically by grouping together related alerts from your monitoring tools. Click the Incidents tab at the top of the screen to manage active incidents from a centralized place. You can organize, assign, investigate, and escalate incidents as necessary to facilitate a quick resolution.

Key Features

  • Use the Incident Feed to filter, search, and sort all active and recently resolved incidents.
  • Access up-to-date details for related alerts.
  • Visualize the incident life cycle on a timeline.
  • Manage incidents through the operations workflow by sharing incidents with co-workers, adding comments, and snoozing non-urgent incidents.
  • Assign Incidents to identify who is responsible for seeing them through to resolution.

How It Works

A single production issue often manifests itself in multiple alerts. For example, a disk issue can trigger a disk IO alert that, in turn, triggers a series of CPU, memory, database, and application alerts. Additionally, each alert may change as an issue progresses. For example, an alert may start as a warning, and then increase in severity to a critical status. In these cases, diagnosing and fixing the issue requires up-to-date information from multiple sources, which is very difficult to gather and maintain manually.

BigPanda digests all of the raw data from your integrated monitoring systems and automatically creates and maintains incidents for related alerts, which gives you the visibility you need to investigate and resolve issues quickly. The alert correlation engine processes data through the following stages:

  • Normalization—converts raw data into standardized key-value pairs, called tags, which you can view and search for in the UI.
  • Enrichment—adds tags for any available information that may aid in resolution, such as the cluster and data center where an object resides or links to relevant time series metrics and runbooks.
  • Correlation—groups related alerts into incidents and keeps them up to date. The alert history is plotted on a timeline so you can visualize how an incident unfolds over time.

All active and recently resolved incidents appear on the Incidents tab, where you can manage incidents through the operations workflow with BigPanda as your unified console. You can also escalate incidents through external ticketing and/or collaboration systems—manually as needed, or automatically as a smart ticketing solution—and BigPanda will keep the external systems up to date with the latest information.

Incidents Workflow

You can use BigPanda as a unified console to manage incidents from detection to resolution. For a quick reference of UI features, see Reference: Incidents Tab.

Organize

On the Incidents tab, you can use the Incident Feed to get a consolidated view of all active and recently resolved incidents from any integrated monitoring systems. Filter, search, and sort the feed to organize your work and focus on your highest priority issues.

Investigate

Start investigating issues by viewing detailed information about an incident in the feed, including related alert details and a timeline of the incident life cycle. Then, use snoozing, comments, and sharing to stay focused on your highest priorities and collaborate with teammates on solutions.

Escalate

Share an incident to loop in a key team member or escalate it through an external ticketing and/or collaboration system. When an incident is shared, BigPanda includes a link to the incident preview page, which allows recipients to see the latest incident status without logging in.