Data Engineering

Data Engineering takes the mess of noisy, low-quality alert payload data and converts it to enriched, high-quality alerts.

Data Engineering is the act of collecting, cleaning, and preparing data for AIOps processing. BigPanda engineers your raw events across several stages including filtering, normalization, deduplication, aggregation, and enrichment.

BigPanda’s Data Engineering service dramatically reduces IT noise by filtering out false positives and benign events, and deduplicating recurring or cross-platform repetitions of the event. By aggregating events into high-quality alerts, IT Ops teams are able to see through the noise to focus on events that are actually related to incidents and outages.

BigPanda’s Data Engineering service aggregates high-quality alerts from all of your monitoring tools together into a single pane of glass. This eliminates the need for teams to switch between different tool consoles when working on incidents and outages.

BigPanda’s Data Engineering service also normalizes and enriches these high-quality alerts with context that is normally buried within event payloads, like location, host, or affected service. With this added context, IT Ops teams are able to more easily judge the significance of alerts and determine next steps.

BigPanda Enrichment ProcessBigPanda Enrichment Process

BigPanda Enrichment Process

Key Features:

Monitoring Integrations

Monitoring integrations allow BigPanda to receive alerts from your monitoring systems, such as Nagios, SolarWinds, and AppDynamics. Many organizations start by integrating these systems.

BigPanda offers 50+ standard integrations to popular monitoring and observability tools. BigPanda also allows you to configure, test, and deploy standard inbound integrations on your own. With BigPanda, you can also ingest events sent via API or email within an intuitive, easy-to-use UI. On top of this, BigPanda's Agent lets you collect alerts from tens of thousands of IT systems and devices.

Start integrating your monitoring tools with the Standard Monitoring Integrations, or learn more about setting up the BigPanda Agent.

For custom solutions and homegrown scripts, you can build advanced integrations by using the Open Integration Manager, Email Parser, or BigPanda API Reference.

Event Filtering

BigPanda gives users the ability to filter out or suppress events generated for nodes or CIs that are under maintenance, in non-production environments, or that match other special circumstances where operators don’t need to be notified of potential outages or incidents.

Dynamic tagging allows you to automatically sort events with certain properties into special folders or mute them using snooze.

Use APIs to schedule specific windows when certain event payloads should be discarded. The Plans V1 API allows you to filter event payloads before they enter the BigPanda pipeline for events that shouldn’t be processed for visible in the BigPanda UI. For events that should still be processed, and potentially investigated, use the Maintenance Plans V2 API to set suppression rules for these events.

Event Filtering Use Case

Use event filtering to keep your monitoring activity in sync with infrastructure changes.

For example:

The set of servers under the billing host is scheduled to undergo upgrades for a duration of one week. Due to the non-operational nature of the servers during this time, all events generated by them will be unnecessary for monitoring.

Instead of having these events cluttering the operator’s feeds and disrupting workflow, you can create a matching Plan with the query host = "billing*" and a Schedule at the same time as the upgrades. This will suppress the irrelevant events from ever entering the feed.

If instead operators should be able to view these events for troubleshooting purposes, you can create a Maintenance Plan for the query host = "billing*" during the upgrade period. These events will be automatically sorted into the Maintenance folder in BigPanda. If another host triggers an event that could be related, these alerts may be correlated with it, to speed up maintenance related troubleshooting.

Event Normalization

Before low-quality alerts can be turned into high-quality alerts, they all must speak the same language.

Unfortunately, each monitoring tool has a unique format and terminology to describe IT elements. This makes it hard for IT Ops teams to consume their data in a consistent manner, and even harder for them to glean valuable insight from this data.

BigPanda normalizes payloads right at ingestion using field configurations for each individual integration.

Primary and Secondary Properties

For each integration, two fields are key for event normalization: the primary and secondary properties. These two properties are used for various purposes:

  • During correlation, BigPanda uses both properties to identify which events are part of the same alert.
    In the default correlation pattern, BigPanda uses the primary property to determine if alerts are related to each other.
  • In the UI, when there is no correlation pattern for the incident, BigPanda uses the primary property to construct the title and the secondary property to construct the subtitle of an incident.
  • As an incident progresses, the primary and secondary properties are key to ensuring that an incident’s severity, scope, and status are updated to match the ongoing outage.

📘

The secondary property is optional. If the event does not contain a value for the secondary property, BigPanda uses only the primary property to process the event.

Event normalization is configured during Integration configuration. Read more about installation and configuration of event payloads in the Integrate with BigPanda documentation.

Event Deduplication

BigPanda’s built-in deduplication process reduces noise by intelligently parsing incoming events. Also known as event deduplication and event marshalling, deduping is the process by which BigPanda eliminates redundant data to reduce noise and simplify incident investigation.

Precise duplicates of existing events are immediately discarded. However, updates to existing alerts are merged rather than creating a brand new alert.

If BigPanda receives two or more events with similar payloads, three things could occur:

ScenarioAction
The event payload (including the application key, timestamp, and primary and secondary properties) exactly matches an event that was already received.The event is dropped.
The timestamp (or any other value in the event payload) has changed, but its status (ok/warning/critical) has not changed.The event is merged with the previous event into a single alert, updating with the tag values from the new event.
The event payload's status has changed from the previous event.The event is added as a new alert.

Sending Multiple Alerts with the Alerts REST API

BigPanda uses the timestamp to determine the latest status of an incident. If it is not included, BigPanda uses the time when the alert was first received. To ensure that BigPanda accurately reflects the most current status, we recommend including the timestamp for each alert.

Event Aggregation

Each event represents a single state for a specific sensor or measurement. As an outage or incident progresses, this status may change, resulting in new event notifications being triggered.

BigPanda combines these similar events into a single alert that can be viewed in an easy-to-understand timeline view.

This enables IT Ops teams to visualize and focus on the full lifecycle of an alert. IT Ops teams can see the current/latest status of the monitored resource in the context of the evolution of that alert over time.

The Alert Life Cycle

Every alert has its own life cycle—it starts at some point, may change severity or state, resolves, and occasionally flaps between states.

BigPanda alerts keep this full context together.

For example:

A CPU load alert may start with a warning event, then increase in severity with a critical event, and finally get resolved with an ok event. All of these events will be combined into a single alert, with each status change visible in the BigPanda UI.

Event updates are displayed in the Incident Timeline. Each row in the timeline represents an individual alert, with each dot representing an event in the life cycle of that alert.

18181818

Incident, Alert, and Event

1 - Incident
2 - Alert
3 - Event

Alert Enrichment

BigPanda event enrichment adds additional contextual information to your alerts, including business segment, relevant CI/CD elements, or operational data. Event enrichment enables you to accurately and quickly detect, understand, and resolve incidents. The enrichment of events also enables powerful event correlation so that you can detect and effectively respond to issues.

Alert Enrichment Tags

During event normalization, BigPanda ingests raw event data and converts it into key-value pairs called tags. Event enrichment allows you to create new tag rules-based on these existing tags, to add metadata and context to incoming events in your organization’s system.

Common enrichment tags include:

  • Adding operational data that enables you and your team to categorize, prioritize, and remediate an incident, such as adding tags for “owner,” “priority,” or “category”.
  • Adding topological information that provides context to the physical and logical elements of the alert, such as adding tags for “cluster,” "data center,” or “city”.

You can create the following alert enrichment tags and rule types:

  • Extraction: extract values from an existing tag to create new custom tags.
  • Composition: combine multiple values of existing tags to create one new custom tag.
  • Multi Type: create a tag composed of several function types.
  • Mapping: added automatically to the list of tag rules when a map includes a result tag value with the same name as a tag.

Administrators are able to customize enrichment tags for your organization. Read more about managing event enrichment in the Manage Alert Enrichment documentation.

Next Steps

Through the data engineering process, raw, noisy events are converted into high-quality alerts. These alerts help your teams focus on only relevant issues, and give them additional knowledge and tools to help the triage and troubleshooting process.

However, BigPanda is just getting started. Those high-quality events are converted into actionable incidents through the incident intelligence process, improving MTTR even further through machine-learning assisted correlation and enrichment.

Read more about how BigPanda takes your high-quality alerts to the next step in the Incident Intelligence documentation.

Or, if you’re ready to start the Data Engineering process, check out the Standard BigPanda Monitoring Integrations documentation.