> For the complete documentation index, see [llms.txt](https://k-ai.gitbook.io/knowledge-ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/indexation-pipeline.md).

# Indexation pipeline

When a document is sent to K-AI, the platform extracts its content, makes it searchable, and prepares it for the Neural Semantic Graph. The customer sees a single state transition from `INITIAL_SAVED` to `INDEXED`; everything in between is managed by the platform.

```mermaid
flowchart LR
    A[Document received] --> B[Content extracted]
    B --> C[Indexed and graphed]
    C --> D[Available to consumers]
```

## What the pipeline does

* **Receives** the document via the [Orchestrator endpoints](/knowledge-ai/sources-and-ingestion/instance-api/orchestrator.md) of the Instance API.
* **Extracts** content from the major document families: PDF, Word, spreadsheets, presentations, email, images, HTML and plain text. Structural cues (headings, tables, page numbers) are preserved.
* **Indexes** the extracted content into a per-instance vector index and into the Neural Semantic Graph. Vectors and graph nodes are scoped to a single K-AI instance — see [Security & isolation](/knowledge-ai/the-k-ai-platform/security.md).
* **Stores** the original file in object storage so consumers can retrieve it on demand.
* **Records** a cost event for the [billing](/knowledge-ai/operate/monitoring.md#cost-events) system.

## What the customer sees

* A document state observable via the [Instance API Documents endpoints](/knowledge-ai/sources-and-ingestion/instance-api/documents.md). States visible to integrators: `INITIAL_SAVED`, `ON_CONTENT_EXTRACT`, `INDEXED`, plus an error state `PARSING_ERROR` for retries.
* A per-document deterministic outcome: re-indexing the same source produces the same result. Re-indexation requests are addressed through the Orchestrator (`reindex-document`, `differential-indexation`, `retry-documents-parsing-error`).
* A per-instance isolation guarantee: vectors from one customer are never shared with, nor comparable across, another customer's instance.

## Where vectors land

Depends on the deployment model:

* **SaaS** — a managed vector index, one per K-AI instance.
* **Snowflake Native App** — `VECTOR` columns inside the customer's own Snowflake account. Data never leaves Snowflake.
* **On-premise** — a bundled vector index running inside the customer's Kubernetes cluster. See [On-premise installation](/knowledge-ai/operate/on-premise.md).

## Next steps

* [Document state machine](/knowledge-ai/sources-and-ingestion/document-state-machine.md) — what each state means and how to observe transitions.
* [Instance API — Orchestrator](/knowledge-ai/sources-and-ingestion/instance-api/orchestrator.md) — endpoints to drive indexation.
* [Operate — Monitoring](/knowledge-ai/operate/monitoring.md) — observability for ingestion at scale.