# SharePoint

K-AI ingests content from Microsoft SharePoint via the Microsoft Graph API (v1.0). SharePoint exposes three distinct surfaces, each modelled as its own source kind:

| Source kind        | Surface            | What it ingests                                                     |
| ------------------ | ------------------ | ------------------------------------------------------------------- |
| `SHAREPOINT_DRIVE` | Document libraries | Files in a site's drives (optionally scoped to one drive or folder) |
| `SHAREPOINT_SITE`  | Site pages         | Published and draft SharePoint pages, rendered to Markdown          |
| `SHAREPOINT_LIST`  | Lists              | List items (rows), rendered to Markdown from selected columns       |

A single Microsoft Entra ID app registration backs all three. The surface and scope (site / drive / list) are **not** part of the credential — they are carried on the source's scope path, so one credential can drive several sources on the same tenant.

## Supported source versions

* **SharePoint Online** (Microsoft 365) — Microsoft Graph API v1.0.

On-premise SharePoint Server is only reachable when surfaced through hybrid Microsoft Graph. Pure on-prem SharePoint Server is not supported by this connector; use the [generic HTTP](/knowledge-ai/sources-and-ingestion/connectors/generic-http.md) connector against the underlying file share.

## Authentication

All three kinds authenticate with a Microsoft Entra ID (Azure AD) app registration using the OAuth 2.0 **client-credentials** grant. K-AI requests the Graph scope `https://graph.microsoft.com/.default`, which resolves to the application permissions granted to the app in the tenant. Two credential flows are supported:

* **Client secret** — set `client_secret`. K-AI builds an Azure `ClientSecretCredential`.
* **Certificate** — set `certificate_thumbprint` (the PEM/PFX certificate bytes) and `certificate_privatekey` (the certificate passphrase, may be empty), optionally `client_certificate_path`. K-AI builds an Azure `CertificateCredential`.

Exactly one of the two flows must be configured. The app must be granted admin consent in the target tenant; ingestion runs entirely as the app identity (no interactive user sign-in). Recommended application permissions: `Sites.Read.All` and `Files.Read.All`.

## Configuration

The credential payload carries auth only. Scope lives on the source's scope path (see [Scope per kind](#scope-per-kind)).

```json
{
  "kind": "SHAREPOINT_DRIVE",
  "sharepoint_host": "contoso.sharepoint.com",
  "tenant_id": "3b401597-0e49-42e3-a848-98a20ba7adad",
  "client_id": "7916e68a-39d7-4d3e-9850-7ae532b411b2",
  "client_secret": "••••••••"
}
```

| Field                     | Type   | Required    | Description                                                      |
| ------------------------- | ------ | ----------- | ---------------------------------------------------------------- |
| `kind`                    | string | yes         | One of `SHAREPOINT_DRIVE`, `SHAREPOINT_SITE`, `SHAREPOINT_LIST`. |
| `sharepoint_host`         | string | yes         | SharePoint host, e.g. `contoso.sharepoint.com`.                  |
| `tenant_id`               | string | yes         | Directory (tenant) ID of the Microsoft 365 account.              |
| `client_id`               | string | yes         | Application (client) ID of the Entra ID app registration.        |
| `client_secret`           | string | secret flow | Client secret. Encrypted at rest by the platform.                |
| `certificate_thumbprint`  | string | cert flow   | Certificate (PEM/PFX) bytes. Encrypted at rest.                  |
| `certificate_privatekey`  | string | cert flow   | Certificate passphrase (may be empty). Encrypted at rest.        |
| `client_certificate_path` | string | no          | Optional path to the certificate file on disk.                   |

One Entra ID credential is compatible with all three kinds — register one credential, then point a source of each kind at it.

### Scope per kind

Each source sets a scope path that selects what to ingest:

| Kind               | Scope path                                          | Example                                                         |
| ------------------ | --------------------------------------------------- | --------------------------------------------------------------- |
| `SHAREPOINT_DRIVE` | `<site_id>[/drives/<drive_id>[/items/<folder_id>]]` | `contoso.sharepoint.com,<guid>,<guid>/drives/b!abc/items/01XYZ` |
| `SHAREPOINT_SITE`  | `site:<site_id>`                                    | `site:contoso.sharepoint.com,<guid>,<guid>`                     |
| `SHAREPOINT_LIST`  | `site:<site_id>/list:<list_id>`                     | `site:<site_id>/list:9f3c…`                                     |

For `SHAREPOINT_DRIVE`, omitting the `/drives/...` segment ingests every drive in the site; adding `/items/<folder_id>` pins ingestion to one subtree. `SHAREPOINT_LIST` additionally carries a list config (selected columns, title column, and an optional item filter) used to render each item to Markdown.

Operators that only have a site name must resolve the site ID before populating the scope path.

## Document types ingested

**`SHAREPOINT_DRIVE`** — files only. Folders are skipped, and CSV files are excluded. File types are filtered to those the [indexation pipeline](/knowledge-ai/sources-and-ingestion/indexation-pipeline.md) supports (PDF, Word, Excel, PowerPoint, text, images, etc.); each file is downloaded as raw bytes and routed to the matching extractor.

**`SHAREPOINT_SITE`** — site pages in all publishing states. Each page is fetched (`?expand=canvasLayout`) and rendered to Markdown (`text/markdown`).

**`SHAREPOINT_LIST`** — list items. Each item is rendered to Markdown (`text/markdown`) from the configured columns. Supported column types include text, note, number, currency, date/time, boolean, choice, person/group, lookup, hyperlink, and calculated.

## Sync mode

| Kind               | Mechanism                                                                                                           | Deletions                           |
| ------------------ | ------------------------------------------------------------------------------------------------------------------- | ----------------------------------- |
| `SHAREPOINT_DRIVE` | Graph **delta** (`/drives/{drive_id}/root/delta`); delta link persisted between runs; folder-scoped delta supported | Native tombstones (`deleted` facet) |
| `SHAREPOINT_SITE`  | Full listing of `/sites/{site_id}/pages` every run (Graph exposes no delta for site pages)                          | Orphan scan on full crawls          |
| `SHAREPOINT_LIST`  | Incremental via `lastModifiedDateTime` cursor (ordered `desc`, client-side early-exit)                              | Orphan scan on full crawls          |

`SHAREPOINT_DRIVE` is the only kind with native deletion signals; the other two rely on the platform's orphan scan to reconcile removals.

## ACL handling

The connector does **not** emit native SharePoint / Microsoft 365 ACLs for any kind. Access in K-AI is **mapped via group rules**, not mirrored from SharePoint's permission model. Set the source's ACL strategy at registration accordingly.

`SHAREPOINT_DRIVE` documents do carry a `shared_scope` metadata value (the item's share status), but this is metadata only — it is not enforced as an ACL.

Metadata emitted per kind:

* **Drive:** `drive_id`, `site_id`, `item_id`, `web_url`, `web_dav_url`, `parent_path`, `e_tag`, `c_tag`, `created_at`, `created_by_display_name`, `last_modified_at`, `last_modified_by_display_name`, `shared_scope`.
* **Site:** `site_id`, `page_id`, `web_url`, `last_modified_at`, `last_modified_by_display_name`, `publishing_state`.
* **List:** `site_id`, `list_id`, `web_url`.

## Rate limits & throttling

Microsoft Graph imposes a per-app, per-tenant request quota that varies by endpoint. The connector honours the `Retry-After` header on `429` and `503` responses and applies exponential back-off. Concurrent fetches per sync are bounded by the K-AI rate-limit policy.

## Known limitations

* **SharePoint Online only.** On-prem SharePoint Server is supported only via hybrid Graph; otherwise use [generic HTTP](/knowledge-ai/sources-and-ingestion/connectors/generic-http.md).
* **No native ACLs.** SharePoint permissions are not mirrored; use mapped group rules.
* **Site pages have no delta.** `SHAREPOINT_SITE` re-lists every run; removed pages are reconciled by the orphan scan.
* **List delta is timestamp-based.** `SHAREPOINT_LIST` uses a `lastModifiedDateTime` cursor, not a Graph delta link; deletions are not signalled natively.
* **CSV files** are excluded from `SHAREPOINT_DRIVE` ingestion.

## Setup walkthrough

1. **Register the K-AI app** in the Microsoft Entra admin centre → App registrations → New registration. Note the **Application (client) ID** and **Directory (tenant) ID**.
2. **Add Graph application permissions** (`Sites.Read.All`, `Files.Read.All`) and click **Grant admin consent**.
3. **Add a credential** — either a client secret (Certificates & secrets → New client secret, copy the value once) or a certificate (upload the public cert to the app; supply the cert bytes and passphrase to K-AI).
4. **Register a source per surface** with the JSON payload above, setting `kind` to `SHAREPOINT_DRIVE`, `SHAREPOINT_SITE`, or `SHAREPOINT_LIST`, and the matching scope path (see [Scope per kind](#scope-per-kind)).
5. **Trigger a test sync** and confirm the document count via the [Documents endpoint](/knowledge-ai/sources-and-ingestion/instance-api/documents.md) of the K-AI Instance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/connectors/sharepoint.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
