For the complete documentation index, see llms.txt. This page is also available as Markdown.

Document state machine

Every document ingested by K-AI moves through a small set of states from registration to availability. State is observable via the Documents endpoints of the Instance API. State transitions are durable and idempotent — retries are safe.

Three principal states — INITIAL_SAVED, ON_CONTENT_EXTRACT, INDEXED — are what a Consumer or Steward encounters. A handful of finer-grained transient states exist between extraction and graph write; they appear in the API responses but are not surfaced as user-facing milestones.

INITIAL_SAVED

The document has been registered with its source metadata: source URL, ACLs (mirrored from the upstream Document Repository), MIME type, last-modified date, and Owner / Authority pointers when known. Content has not yet been extracted.

Entering this state queues the document for indexation.

ON_CONTENT_EXTRACT

Extraction is in progress: content is parsed (per MIME type), semantically structured, and embeddings are computed.

In this state the document is searchable on its metadata, but its content does not yet contribute to vector search or to the Neural Semantic Graph.

INDEXED

Embeddings have been written to the per-instance vector store. Semantic graph nodes (concept, subject, actor, dependency) have been written to the instance's database. The document is fully available to Consumers — both for vector search and for Neural Semantic Graph traversal.

INDEXED is the only terminal state for a healthy document. A re-indexation request moves the document back to INITIAL_SAVED and re-runs the pipeline; the result is deterministic for an unchanged source document.

Failures & retries

Failures during extraction transition the document to PARSING_ERROR. The most common causes are: source unreachable at fetch time (expired credentials, deleted file, permission revoked upstream), unsupported or corrupted file format, or extraction timeout on very large files.

Documents in PARSING_ERROR are visible via /api/documents/list-docs with a state filter. To re-queue, call POST /api/orchestrator/retry-documents-parsing-error (re-queues every document currently in error) or POST /api/orchestrator/reindex-document with a specific document id — both are idempotent. Re-indexing the same document produces the same vectors and graph nodes, modulo content changes upstream.

Observing state

A Steward or Engineer (DocOps) tracking indexation progress at scale should:

  • Poll POST /api/documents/list-docs with a state filter to enumerate documents in a given state.

  • Call POST /api/documents/count-documents (optionally with a state filter) to track aggregate progress without paginating the full list.

  • Call POST /api/orchestrator/count-back-tasks to see queue pressure on the indexation pipeline.

See Instance API — Documents for full request and response schemas.

Last updated