SFTP Data Connector
The BigPanda Secure File Transfer Protocol (SFTP) data connector ingests file-based data from SFTP servers into the Unified Data Connector (UDC) pipeline. It supports configurable incremental sync strategies, streaming CSV reads with retries on transient failures, and chunked writes for large files.
Key features
Scheduled file ingestion from SFTP servers via cron expressions.
Configurable fallback behavior for files with unparseable filename dates.
Streaming CSV reads with automatic retry on transient SFTP failures.
Progress logging and chunked writes for large files.
Read resilience and streaming
The SFTP connector streams CSV rows as it reads each file rather than buffering the entire file in memory, and automatically retries transient read failures.
Append mode
Because retries restart the file from the beginning, rows emitted before a mid-file failure may be re-emitted after a successful retry. In append-mode pipelines, this can introduce duplicate rows for the affected file. Use a downstream deduplication step or prefer replace-mode loads when duplicates are not acceptable.
Streaming CSV reads. CSV files are streamed row-by-row rather than fully loaded into memory before processing. During a read, the connector logs the file size before the read starts and a row-progress counter every 1,000 rows processed. Use these logs to monitor ingestion progress for large files.
Chunked writes. SFTP pipelines apply connector-scoped chunking limits so that load packages flush in smaller batches. For very large input files, this means the downstream loader can start processing data before the entire source file has been read, improving end-to-end throughput.
Authentication
The SFTP connector uses SSH key authentication. These details are required when configuring the connection:
Instance URL: SFTP host (for example,
sftp://sftp.example.comorsftp.example.com).Username: SFTP username.
Private key: PEM-encoded private key (RSA, Ed25519, or ECDSA).
Passphrase: Optional; for encrypted private keys.
Sync preferences
Provide the following information about your sync preferences to BigPanda:
Required Configuration
Option | Description |
|---|---|
| Cron expression for scheduling the connector (for example, |
Optional Pipeline Configuration
Option | Description |
|---|---|
| Start date for data collection in |
| Controls how the SFTP connector tracks progress during incremental syncs. You can choose one of three strategies.
Tip Remapping and incremental sync When you set |
| Fallback behavior when filename dates are unparseable. Only applies when
|
| Remaps incoming file rows into a ServiceNow-style
|
| Overrides the destination resource name that the connector loads into. When unset, the destination defaults to |
Fallback strategy best practice
Start with mtime_fallback during initial setup to avoid pipeline failures from unexpected filenames. Switch to skip or fail once you have validated that all source files follow a consistent naming convention.
Use caution during this process. Switching between incremental strategies on an existing pipeline may cause files to be reprocessed or skipped, depending on the difference between filename dates and file modification times. Plan strategy changes during a maintenance window.
Output remapping and destination resources
By default, the SFTP connector loads each file's rows in their raw source-file shape as source columns into the sftp_files resource. Set row_format to remap exports into ServiceNow-style records before they are loaded:
itmapp_servicenoworitmapp_servicenow_incidentmaps incident-shaped rows.itmapp_servicenow_cmdb_cimaps configuration-item-shaped rows.
The destination resource the connector loads into depends on your configuration:
No
row_format: Loads target thesftp_filesresource.Incident remapping: Loads default to the
incidentresource.Configuration-item remapping: Loads default to the
cmdb_ciresource.dlt_resource_nameset: Loads target the resource name you provide, overriding the defaults above.
Remapped SFTP loads
If you enable row_format on an existing SFTP pipeline, the destination resource changes from sftp_files to the remapped resource (incident or cmdb_ci), or to the value of dlt_resource_name. Because cursor state follows the effective resource name, an existing pipeline's tracked progress does not carry over to the new resource.
Plan changes during a maintenance window
Enabling or changing row_format or dlt_resource_name on an existing pipeline changes the destination resource and the resource that cursor state is tracked against. This may cause data to load into a different resource than before, or progress to be re-tracked from the start. Plan these changes during a maintenance window, and validate the destination resource and incremental progress after the change.