SharePoint Data Connector

The BigPanda Unified Data Connector (UDC) syncs SharePoint list metadata through the Microsoft Graph API to provide context and insights for AI Incident Assistant (Biggy), AI Incident Prevention, and AI Detection and Response.

Ingested data is securely stored and made available in the IT Knowledge Graph, powering analytics, trend analysis, and downstream operations workflows.

Metadata only sync

The SharePoint connector ingests list item metadata and selected scalar list columns. It does not move document contents, attachments, page HTML, or binary payloads from document libraries into BigPanda.

Rows from document libraries may appear as list items (for example, file name, modified date, URL), but the connector does not download file bodies.

Non-scalar column types (lookups, person/group, multi-value choice, etc.) may be omitted or normalized depending on how Microsoft Graph represents them. Use internal SharePoint column names, not display labels.

When to use this connector

SharePoint lists are the source of truth (change calendars, KB indexes, runbook trackers, approval lists, etc.).
You need scheduled, incremental sync into the IT Knowledge Graph.
Linking to SharePoint items via webUrl is sufficient; file bodies are not required in BigPanda.

When not to use this connector

Primary knowledge lives in document library files (Word, PDF, etc.).
You need wiki page HTML or attachment binaries in BigPanda.
Knowledge is in Confluence or ServiceNow KB — use those connectors instead.
SharePoint on-premises farms (non-Microsoft 365) — not supported via this Graph-based connector.

Authentication

The SharePoint connector uses OAuth 2.0 client credentials against Microsoft Entra ID (Azure AD) and calls Microsoft Graph. BigPanda refreshes credentials before sync requests, so scheduled runs continue without manual re-authorization. Auth strategy cannot be changed when editing an existing connection.

Microsoft Graph prerequisites

Before BigPanda can configure the connector, your organization must register an application in Microsoft Entra ID that grants the BigPanda application (not delegated) read access to the SharePoint list data you want to sync. The application requires the following admin-consented Microsoft Graph application permission:

Sites.Read.All

Provide the client ID, the client secret, and the following connection settings to your BigPanda account team, who will complete the authorization and set up the connector.

Setting	Value
`instance_url`	`https://graph.microsoft.com` instance_url is always the Graph API base URL. The SharePoint site path is configured separately on the pipeline as site_url.
Auth strategy	OAuth 2.0 client credentials
Application (client) ID	From your Entra app registration
Client secret	The secret Value (not the Secret ID)
`oauth2_token_url`	`https://login.microsoftonline.com/<tenant_id>/oauth2/v2.0/token`
`scope`	`https://graph.microsoft.com/.default` (used by default when omitted)

Configure the SharePoint connector

Provide the following configuration to your BigPanda account team. The connector creates one output table per configured list.

Field	Required	Default	Description
`site_url`	Yes	—	SharePoint site path in Graph format: `hostname:/sites/....` For example: `contoso.sharepoint.com:/sites/MySite.` Do not include https://.
`lists`	Yes	—	Map of output table name and list definition. Each list produces one output table.
`list_id`	Yes	—	Within each `lists` entry, the Microsoft Graph list GUID.
`field_names`	No	All columns	Within each `lists` entry, the specific list columns to include, using their internal SharePoint names. Omit this field to load all list columns. The active cursor field remains available for incremental tracking even if it isn't included in `field_names`.
`start_date`	Yes	—	The start of the sync window, in `YYYY-MM-DD` format. Sets the initial sync window and incremental cursor baseline. Required when creating a pipeline. Must be today or earlier.
`cron_schedule`	Yes	—	Cron expression for scheduled sync runs (for example, every 15 minutes).
`timezone`	No	UTC	Timezone for schedule interpretation.
`page_size`	No	100	Items per Graph request ($top). Maximum 999.
`rate_limit`	No	20	Maximum requests per minute.
`rate_limit_timeout_ms`	No	1000	Milliseconds to wait when the local rate limiter throttles requests.
`request_timeout`	No	60	Seconds before a Graph request times out.
`cursor_field`	No	—	Within each `lists` entry, sets the sync cursor for that specific list. Accepts only `lastModifiedDateTime` or `Modified`. Any other value fails configuration validation. Omit to inherit the pipeline-level default.

Example configuration

{
  "cron_schedule": "*/15 * * * *",
  "start_date": "2024-01-01",
  "timezone": "UTC",
  "site_url": "contoso.sharepoint.com:/sites/MySite",
  "lists": {
    "tasks": {
      "list_id": "00000000-0000-0000-0000-000000000001",
      "field_names": ["Title", "Status", "Priority"]
    }
  },
  "page_size": 100,
  "rate_limit": 20,
  "rate_limit_timeout_ms": 1000,
  "request_timeout": 60
}

Finding a list GUID

In SharePoint, open the List > Settings > List settings. The list GUID appears in the URL as List=%7B<guid>%7D, or use Microsoft Graph to enumerate lists for the site.

Output schema

The connector creates one output table per lists entry. Each table uses id as its primary key and Modified as its sync cursor. Each table uses id as its primary key. The sync cursor is Modified by default, or lastModifiedDateTime if configured via cursor_field for that list.

Field	Description
`id`	Unique identifier of the list item. This is the primary key for the table.
`createdDateTime`	The date and time the item was created.
`lastModifiedDateTime`	The date and time the item was last modified, as reported by Microsoft Graph.
`Modified`	The SharePoint `Modified` column. The connector uses this column as the sync cursor unless `cursor_field` is set to `lastModifiedDateTime` for that list.
`webUrl`	A link to the item in SharePoint.
Selected scalar list columns	Any scalar columns you name in `field_names` for the list. Only scalar values are included. When omitted, all `field_names` are included.

Sync behavior

Ongoing sync is incremental based on the SharePoint Modified column.

Initial / backfill: On the first run (or after a cursor reset), the connector loads items with Modified on or after start_date.
Subsequent runs: The connector loads items with Modified greater than or equal to the stored cursor from the previous successful run.
Scheduling: cron_schedule controls how often incremental runs execute.

Filtering uses SharePoint modified column

The connector filters on the SharePoint Modified field rather than the Graph lastModifiedDateTime field, because Microsoft Graph does not support filtering on lastModifiedDateTime for list items.

Request and performance controls

You can tune the following controls to manage paging and request behavior.

Control	Default	Description
`page_size`	`100`	The number of items requested from Microsoft Graph in a single call. This maps to the Graph `$top` page size.
`rate_limit`	`20`	The maximum number of requests sent per minute.
`rate_limit_timeout_ms`	`100`	Wait time when the connector’s rate limiter throttles outbound requests.
`request_timeout`	`60`	How long the connector waits, in seconds, before timing out a request.

When Microsoft Graph returns HTTP 429, the connector honors the Retry-After header before retrying.

Troubleshooting

If a sync run fails, review the items below.

Symptom	What to check
Authentication failures	Entra app client ID, client secret Value, `Sites.Read.All` application permission, and admin consent are valid. Token URL uses the correct tenant_id.
Wrong API endpoint	Connection instance_url is https://graph.microsoft.com, not the SharePoint site hostname. Site path belongs in pipeline `site_url`.
Site resolution failures	`site_url` uses Graph site path format (`hostname:/sites/...`) without `https://`. Site exists and the app can read it.
List not found / empty table	`list_id` is the correct Graph list GUID for that site. App has read access to the site and list.
Missing or wrong columns	`field_names` use internal SharePoint column names (for example Title, not a display label).
Rate limiting (429)	Lower rate_limit or increase `rate_limit_timeout_ms`. Connector already respects Graph Retry-After.
Unexpected date range	`start_date` is `YYYY-MM-DD`, not in the future, and reflects the backfill window you intend.
No file contents	This is expected. The connector syncs metadata only, not document binaries.

FAQs

Why does the connection use graph.microsoft.com instead of our SharePoint site URL?

The connection authenticates to Microsoft Graph. The SharePoint site is configured separately on the pipeline as site_url.

Can this connector ingest Word or PDF files from document libraries?

No. It syncs list item metadata only. File name and URL may appear for library rows, but file bodies are not downloaded.

Do we need Files.Read.All?

No for this connector. Sites.Read.All (application) is sufficient for list metadata.

In this section: