> For the complete documentation index, see [llms.txt](https://k-ai.gitbook.io/knowledge-ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/connectors/generic-http.md).

# Generic HTTP

The Generic HTTP connector pulls documents from any HTTP endpoint that implements the K-AI generic driver contract: a `POST /documents` listing endpoint and a content fetch endpoint, authenticated via API key or OAuth 2.0 client credentials. It is the recommended path for custom or in-house Document Repositories that do not have a dedicated K-AI connector.

## Supported source versions

The connector targets the **K-AI Generic Driver contract**, which is API-versionless. Any HTTP source that implements the contract is supported regardless of language, framework, or hosting.

A reference implementation of the generic driver is available from the K-AI integration team. Customers can use it as the starting point for their own driver.

## Authentication

Two authentication modes are supported, picked automatically by the connector based on which fields are populated:

* **API key** — a static key sent in the `api-key` HTTP header. Suitable for in-house services that already have an internal token convention.
* **OAuth 2.0 client credentials** — the connector calls the source's token endpoint with `grant_type=client_credentials`, optionally with a `scope`, and sends the resulting access token as `Authorization: Bearer <token>`.

The connector caches the OAuth bearer token until expiry and refreshes it transparently. Token endpoints that return malformed responses (no `access_token` claim) cause the source registration to fail at credential-check time.

## Configuration

| Field                | Type   | Required    | Description                                                           |
| -------------------- | ------ | ----------- | --------------------------------------------------------------------- |
| `name`               | string | yes         | Display name for the source.                                          |
| `host`               | string | yes         | Base URL of the driver, e.g. `https://my-repo.acme.com`.              |
| `passkey`            | string | conditional | API key. Required when not using OAuth.                               |
| `oauth2ClientId`     | string | conditional | OAuth client ID. Required when not using `passkey`.                   |
| `oauth2ClientSecret` | string | conditional | OAuth client secret. Encrypted at rest by the platform.               |
| `oauth2TokenUrl`     | string | conditional | OAuth token endpoint, e.g. `https://auth.acme.com/oauth/token`.       |
| `oauth2Scope`        | string | no          | OAuth scope. Optional — depends on the authorization server's policy. |

Either `passkey` or the full OAuth triple (`oauth2ClientId` + `oauth2ClientSecret` + `oauth2TokenUrl`) must be provided.

## Driver contract

The source must expose at least two endpoints:

```
POST /documents
  body: { "offset": 0, "limit": 100, "since": "2026-05-01T00:00:00Z" }
  response: { "total": 1234, "items": [ { "id": "...", "title": "...", "mime_type": "...", "last_modified_at": "...", "acl": [...] }, ... ] }

GET /documents/{id}/content
  response: binary content with Content-Type matching mime_type
```

Optional endpoints:

```
GET /health        # called during credential check; expected 200
GET /acl/groups    # used when the driver wants to publish group definitions to K-AI
```

The `acl` array on each document item contains group or user identifiers in the customer's IdP. K-AI mirrors them verbatim onto the document record.

## Document types ingested

Anything the [indexation pipeline](/knowledge-ai/sources-and-ingestion/indexation-pipeline.md) can extract:

* PDF, Word, Excel, PowerPoint, OpenDocument.
* Plain text, Markdown, RTF, HTML.
* Images (OCR).
* Email archives (`.msg`, `.eml`).

The connector does not inspect content — it trusts the `mime_type` declared by the driver. Drivers that lie about MIME types will trigger extraction failures downstream.

## Sync mode

The sync mode is **configurable by the driver**. The connector calls `POST /documents` with a `since` cursor; the driver is expected to return only items modified since that timestamp. When the driver does not implement `since`, the connector falls back to a full re-listing on every sync.

Deletions can be signalled either via a `deleted_at` field on items (preferred) or detected on full re-scan only. Customers with delete-sensitive use cases should implement the `deleted_at` path in their driver.

## ACL handling

ACLs are **caller-provided** — the driver is the source of truth for who can read what. Each document item carries an `acl` array that K-AI mirrors verbatim. At retrieval time, [K-AI MCP](/knowledge-ai/k-ai-mcp/mcp.md) replays the ACL against the calling identity through the customer's IdP.

When the driver does not produce ACLs (`acl: []`), the connector applies the configured fallback group (default: `generic_source_readers`). The fallback is set at registration time and is the seam where a Steward enforces a default access policy.

## Rate limits & throttling

The connector caps concurrent requests to the driver (default: 8) and honours `Retry-After` headers on `429` and `503` responses with exponential back-off and jitter. Drivers without rate-limit headers are throttled by the connector-side cap only.

Listing requests use pagination (`offset` / `limit`); the page size is negotiable per source (default: 100).

## Known limitations

* **Contract compliance** is on the customer — drivers that deviate from the contract (e.g. inconsistent `total`, missing `mime_type`, body-less listings) will produce indexation errors that surface only at runtime.
* **No native filtering** beyond `since` — content-type or path filtering must be implemented driver-side.
* **No content streaming** for very large files via the standard contract; drivers should expose a separate streaming endpoint and document it for the K-AI integration team.
* **OAuth tokens** are cached in memory only; a connector restart triggers a fresh token exchange.
* **Signed URLs** in `acl` are not natively supported; drivers that need per-document URL signing must serve content from `/documents/{id}/content` directly.

## Setup walkthrough

1. **Implement the driver** on the source side. Either expose the listing + content endpoints directly from the source application, or build a thin adapter service in front of it.
2. **Pick an authentication mode**: generate an API key or register an OAuth 2.0 client with the customer's authorization server.
3. **Verify the contract** — call `POST /documents` and `GET /documents/{id}/content` from a curl prompt with the credentials. Confirm a non-empty listing and a successful content fetch.
4. **Add the source in the K-AI admin portal**: select Generic HTTP, fill `name`, `host`, and either `passkey` or the OAuth triple. Set the sync schedule.
5. **Trigger a test sync** and confirm the document count via the [Documents endpoint](/knowledge-ai/sources-and-ingestion/instance-api/documents.md) of the K-AI Instance. Run a query through [K-AI MCP](/knowledge-ai/k-ai-mcp/mcp.md) under a sample identity to verify the ACL pass-through.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/connectors/generic-http.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
