ETL Sync

BigPanda raw data can be synced with external data tools.

BigPanda maintains raw data in standard objects that can be accessed through the BigPanda ETL pipeline, or can be exported in CSV format and processed through your own external tools.

Both BigPanda Analytics and the ETL Pipeline make data from the platform available in near real-time. Data will be available on average in 1-2 minutes, but during high data-volume periods may take as many as 6-8 minutes to be available.

🚧

The exported CSV will not have headers and the source data may change at any time. Ensure your ETL process is able to handle changes in the CSV

To connect to the BigPanda ETL Pipeline, your data tools will need to first be configured.

Files can be accessed through the ETL by either:

  • Syncing or copying the raw files into a bucket of your own for your own ETL to transform and store the data
  • Use the BigPanda ETL process to write directly to your redshift instance.

👍

Best Practices Guide

For detailed best practices on configuring your system to receive ETL data from BigPanda, see the Receiving BigPanda raw data guide in BigPanda University. (You must log in to BPU to access the content)

Schema

Each of the objects below are available for ETL sync or can be downloaded in CSV format.

  • Activities - a single action that a user has been doing
  • Change Relations Activities - changes that have been marked as matched or not matched to an Incident
  • Changes - Changes that have been sent to and processed by BigPanda
  • Change Tags - the tags associated to a single change in BigPanda
  • Entities - a single aggregated alert in BigPanda (e.g, a row in the BigPanda timeline view)
  • Events - all status changes for a single raw event in BigPanda
  • Incidents - a single BP Incident
  • Incident Tags - the incident tags associated with a single BigPanda incident
  • Maintenance Events - events that matched to BigPanda maintenance plans
  • Maintenance Plans - a single BigPanda maintenance plan
  • Matchers Log - the correlation patterns that matched and applied to a single BigPanda incident
  • RCC Related Changes - a single change suspected as a potential root cause by the BigPanda algorithm for a single incident
  • Tags - a single tag of an entity
  • User - a single user
  • VW Tags - View Tags - all tags of an entity with normalized fields

In the ETL process, 2 additional columns are added to each table with ETL information:

  • _sys_import_time - the timestamp of the sync
  • _sys_execution_id - The unique identifier of the sync execution

📘

When using the BigPanda ETL, incident_id and tag_id are the primary identifiers

For details on the attributes for each available object, see the Raw Data ETL Schema documentation.

ETL Raw CSV Schema

ETL Raw CSV Schema

tags or vw_tags table
In the raw CSV, this table is called the tags table and all tag values of an entity appear on their own row in columns for each entity.

During ETL, this data is condensed so that each tag will have its own value column. This creates the view tags or vw_tags table. The system will also look through a configured list of host and check fields and create the _normalized_check and _normalized_host columns with the first value found.

AWS S3-Based Sync

An S3 Sync can be configured to mirror BigPanda BI analytics information in your organization’s S3 bucket.

When the process is enabled in BigPanda, it starts generating the raw data files in CSV format into the S3 bucket on the BigPanda side. Replication privileges then allow the data to be replicated into the S3 bucket for your organization.

AWS S3 Configuration

  1. Contact your BigPanda Account Manager and they’ll set up an S3 bucket for you in BigPanda’s AWS infrastructure and enable the sync
  2. Create an S3 bucket with versioning enabled
  3. Provide BigPanda Support the ARN and AWS account ID
  4. BigPanda Support will provide a bucket policy
  5. Add the bucket policy to your bucket to begin to receive objects through replication
    a. Click on the bucket
    b. Click on the Permissions tab.
    c. Select Bucket Policy and enter the policy provided by support
    d. Click on the Management tab.
    e. Choose Replication and click More -> Receive objects
    f. Add BigPanda AWS Account ID in the source and click Done
  6. Confirm that the objects are received in your S3 bucket

Encryption

There is no encryption on the S3 bucket on BigPanda side. When the objects are replicated into the Customer bucket, they assume the destination privileges/permissions as well as encryption requirements.

If you use the default encryption on your S3 bucket, you need to add the following statement to the KMS key used on the bucket:

{
  "Sid": "Allow BigPanda to Generate Data Key",
  "Action": "kms:GenerateDataKey",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::103749124141:root"
  },
  "Resource": "*"
}

Redshift Warehouse

A Redshift Warehouse can be provisioned to connect to the ETL BigPanda pipeline to retrieve BI analytics information.

Redshift Warehouse Configuration

The minimum requirements for the ETL pipeline are the following:

  • Node Count: 2
  • Node Type: dc2.large
  • Region: us-west-2

🚧

If the load is too large we recommend increasing the number of nodes, instead of changing the cluster type. We have seen that performance improves more with more nodes rather than larger ones.

The Redshift cluster must also be configured so that it is publicly accessible over the internet. If this is not an option please reach out to your BigPanda contact who can help enable VPC peering between your Redshift VPC and BigPanda’s ETL VPC.

NAT Gate IP’s
BigPanda will send data from specific IP’s

  • 54.68.166.148
  • 54.214.216.245

User Accounts

The BigPanda ETL requires a user which is used to load data to the Redshift warehouse and perform management operations.

This user needs full access to the ETL database. An administrator can create this user and grant the appropriate permissions with the following SQL statements:

CREATE USER etl_loader WITH PASSWORD "your-password-here";
GRANT ALL PRIVILEGES ON DATABASE database_name_here TO etl_loader;

Connection Limits (Optional)

If you have a lot of other processes in your warehouse we recommend setting a limit on the etl_loader user so that it doesn’t affect these other services. By default the BigPanda ETL opens 25 connections at a time. You can set the limit higher but must provide at least 25 connections as a minimum.

ALTER USER etl_loader CONNECTION LIMIT 25;

Workload Management (WLM) Setup (Optional)

BigPanda’s recommendation is to use the Automatic Workload Management settings to allow Redshift to allocate concurrency and memory as needed.

If you prefer you can configure the queues, concurrency, and memory allocation manually.

  1. Create etl_loader group
CREATE GROUP etl_loader WITH USER "etl_loader";
  1. Create WLM queues
    Create and associate a queue for the etl_loader group group

  2. Create parameter groups
    Create a queue for the etl_loader and etl_vacuumer group/user

  3. Set the Manual WLM

For more information on setting up the WLM, see the official AWS Documentation.

Connect to the BigPanda ETL

Share the following information to your BigPanda contact and they will work with the technical team to enable the ETL for your Redshift warehouse.

  1. RedShift Endpoint
    a. This can be found in the AWS console under your Redshift cluster
    b. Includes the port number
  2. Database name
  3. etl_loader username and password

Next Steps

Review the Raw Data ETL Schema

Dig into additional reporting options with BigPanda Analytics