> For the complete documentation index, see [llms.txt](https://k-ai.gitbook.io/knowledge-ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/document-state-machine.md).

# Document state machine

Every document ingested by K-AI moves through a small set of states from registration to availability. State is observable via the Documents endpoints of the Instance API. State transitions are durable and idempotent — retries are safe.

```mermaid
stateDiagram-v2
    [*] --> INITIAL_SAVED
    INITIAL_SAVED --> ON_CONTENT_EXTRACT
    ON_CONTENT_EXTRACT --> INDEXED
    INDEXED --> [*]
    INITIAL_SAVED --> PARSING_ERROR
    ON_CONTENT_EXTRACT --> PARSING_ERROR
    PARSING_ERROR --> INITIAL_SAVED
```

Three principal states — `INITIAL_SAVED`, `ON_CONTENT_EXTRACT`, `INDEXED` — are what a Consumer or Steward encounters. A handful of finer-grained transient states exist between extraction and graph write; they appear in the API responses but are not surfaced as user-facing milestones.

## INITIAL\_SAVED

The document has been registered with its source metadata: source URL, ACLs (mirrored from the upstream Document Repository), MIME type, last-modified date, and Owner / Authority pointers when known. Content has not yet been extracted.

Entering this state queues the document for indexation.

## ON\_CONTENT\_EXTRACT

Extraction is in progress: content is parsed (per MIME type), semantically structured, and embeddings are computed.

In this state the document is searchable on its metadata, but its content does not yet contribute to vector search or to the [Neural Semantic Graph](/knowledge-ai/the-k-ai-platform/neural-semantic-graph.md).

## INDEXED

Embeddings have been written to the per-instance vector store. Semantic graph nodes (concept, subject, actor, dependency) have been written to the instance's database. The document is fully available to Consumers — both for vector search and for [Neural Semantic Graph](/knowledge-ai/the-k-ai-platform/neural-semantic-graph.md) traversal.

`INDEXED` is the only terminal state for a healthy document. A re-indexation request moves the document back to `INITIAL_SAVED` and re-runs the pipeline; the result is deterministic for an unchanged source document.

## Failures & retries

Failures during extraction transition the document to `PARSING_ERROR`. The most common causes are: source unreachable at fetch time (expired credentials, deleted file, permission revoked upstream), unsupported or corrupted file format, or extraction timeout on very large files.

Documents in `PARSING_ERROR` are visible via `/api/documents/list-docs` with a `state` filter. To re-queue, call `POST /api/orchestrator/retry-documents-parsing-error` (re-queues every document currently in error) or `POST /api/orchestrator/reindex-document` with a specific document id — both are idempotent. Re-indexing the same document produces the same vectors and graph nodes, modulo content changes upstream.

## Observing state

A Steward or Engineer (DocOps) tracking indexation progress at scale should:

* Poll `POST /api/documents/list-docs` with a `state` filter to enumerate documents in a given state.
* Call `POST /api/documents/count-documents` (optionally with a `state` filter) to track aggregate progress without paginating the full list.
* Call `POST /api/orchestrator/count-back-tasks` to see queue pressure on the indexation pipeline.

See [Instance API — Documents](/knowledge-ai/sources-and-ingestion/instance-api/documents.md) for full request and response schemas.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://k-ai.gitbook.io/knowledge-ai/sources-and-ingestion/document-state-machine.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
