Skip to main content

SFTP Data Connector

The BigPanda Secure File Transfer Protocol (SFTP) data connector ingests file-based data from SFTP servers into the Unified Data Connector (UDC) pipeline. It supports configurable incremental sync strategies, streaming CSV reads with retries on transient failures, and chunked writes for large files.

Key features

  • Scheduled file ingestion from SFTP servers via cron expressions.

  • Configurable fallback behavior for files with unparseable filename dates.

  • Streaming CSV reads with automatic retry on transient SFTP failures.

  • Progress logging and chunked writes for large files.

Read resilience and streaming

The SFTP connector streams CSV rows as it reads each file rather than buffering the entire file in memory, and automatically retries transient read failures.

Append mode

Because retries restart the file from the beginning, rows emitted before a mid-file failure may be re-emitted after a successful retry. In append-mode pipelines, this can introduce duplicate rows for the affected file. Use a downstream deduplication step or prefer replace-mode loads when duplicates are not acceptable.

Streaming CSV reads. CSV files are streamed row-by-row rather than fully loaded into memory before processing. During a read, the connector logs the file size before the read starts and a row-progress counter every 1,000 rows processed. Use these logs to monitor ingestion progress for large files.

Chunked writes. SFTP pipelines apply connector-scoped chunking limits so that load packages flush in smaller batches. For very large input files, this means the downstream loader can start processing data before the entire source file has been read, improving end-to-end throughput.

Authentication

The SFTP connector uses SSH key authentication. These details are required when configuring the connection:

  • Instance URL: SFTP host (for example, sftp://sftp.example.com or sftp.example.com).

  • Username: SFTP username.

  • Private key: PEM-encoded private key (RSA, Ed25519, or ECDSA).

  • Passphrase: Optional; for encrypted private keys.

Sync preferences

Provide the following information about your sync preferences to BigPanda:

Required Configuration 

Option

Description

cron_schedule 

Cron expression for scheduling the connector (for example, 0 */4 * * * for every 4 hours).

Optional Pipeline Configuration 

Option

Description

start_date 

Start date for data collection in YYYY-MM-DD format.

incremental_by 

Controls how the SFTP connector tracks progress during incremental syncs. You can choose one of three strategies.

  • file_mtime - Uses the file modification time to determine which files are new or updated since the last sync. This is the default behavior and works well when files are written once and not modified after initial creation.

  • row_column - Uses a date column within each row of the file data to track incremental progress. This is useful when individual rows within a file represent time-series data and you want to track progress at the row level rather than the file level.

  • filename_date - Extracts dates in YYYYMMDD format from filenames to determine which files to include in each sync. Files are filtered and counted based on the parsed filename date rather than the file modification time. Useful when files are named with date stamps (for example, transactions_20250415.csv), when file modification times are unreliable, or when date-based filtering must align with business date boundaries.

Tip

Remapping and incremental sync

When you set row_format, cursor state follows the effective destination resource name, incident, cmdb_ci, or the value of dlt_resource_name , rather than sftp_files. Row-level filters and the row_column incremental strategy continue to use the source CSV column names from before remapping, not the remapped field names.

filename_date_fallback 

Fallback behavior when filename dates are unparseable. Only applies when incremental_by is filename_date:

  • mtime_fallback - Falls back to the file modification time for filtering and date-range assignment. The file is still included in the sync.

  • skip - Ignores the file entirely. It will not be processed or counted.

  • fail - Stops the pipeline with an error. Use this when all files are expected to have date-stamped names and a missing date indicates a problem.

row_format

Remaps incoming file rows into a ServiceNow-style incident and cmdb_ci rows before they are loaded. When unset, rows are loaded in their raw source-file shape as source columns into the sftp_files resource.

row_format=itmapp_servicenow or row_format=itmapp_servicenow_incident emits incident-shaped rows, and row_format=itmapp_servicenow_cmdb_ci emits configuration-item-shaped rows.

dlt_resource_name

Overrides the destination resource name that the connector loads into. When unset, the destination defaults to sftp_files, or to the remapped resource (incident or cmdb_ci) when row_format is set.

Fallback strategy best practice

Start with mtime_fallback during initial setup to avoid pipeline failures from unexpected filenames. Switch to skip or fail once you have validated that all source files follow a consistent naming convention.

Use caution during this process. Switching between incremental strategies on an existing pipeline may cause files to be reprocessed or skipped, depending on the difference between filename dates and file modification times. Plan strategy changes during a maintenance window.

Output remapping and destination resources

By default, the SFTP connector loads each file's rows in their raw source-file shape as source columns into the sftp_files resource. Set row_format to remap exports into ServiceNow-style records before they are loaded:

  • itmapp_servicenow or itmapp_servicenow_incident maps incident-shaped rows.

  • itmapp_servicenow_cmdb_ci maps configuration-item-shaped rows.

The destination resource the connector loads into depends on your configuration:

  • No row_format: Loads target the sftp_files resource.

  • Incident remapping: Loads default to the incident resource.

  • Configuration-item remapping: Loads default to the cmdb_ci resource.

  • dlt_resource_name set: Loads target the resource name you provide, overriding the defaults above.

Remapped SFTP loads

If you enable row_format on an existing SFTP pipeline, the destination resource changes from sftp_files to the remapped resource (incident or cmdb_ci), or to the value of dlt_resource_name. Because cursor state follows the effective resource name, an existing pipeline's tracked progress does not carry over to the new resource.

Plan changes during a maintenance window

Enabling or changing row_format or dlt_resource_name on an existing pipeline changes the destination resource and the resource that cursor state is tracked against. This may cause data to load into a different resource than before, or progress to be re-tracked from the start. Plan these changes during a maintenance window, and validate the destination resource and incremental progress after the change.