For the complete documentation index, see llms.txt. This page is also available as Markdown.

Indexation pipeline

When a document is sent to K-AI, the platform extracts its content, makes it searchable, and prepares it for the Neural Semantic Graph. The customer sees a single state transition from INITIAL_SAVED to INDEXED; everything in between is managed by the platform.

What the pipeline does

  • Receives the document via the Orchestrator endpoints of the Instance API.

  • Extracts content from the major document families: PDF, Word, spreadsheets, presentations, email, images, HTML and plain text. Structural cues (headings, tables, page numbers) are preserved.

  • Indexes the extracted content into a per-instance vector index and into the Neural Semantic Graph. Vectors and graph nodes are scoped to a single K-AI instance — see Security & isolation.

  • Stores the original file in object storage so consumers can retrieve it on demand.

  • Records a cost event for the billing system.

What the customer sees

  • A document state observable via the Instance API Documents endpoints. States visible to integrators: INITIAL_SAVED, ON_CONTENT_EXTRACT, INDEXED, plus an error state PARSING_ERROR for retries.

  • A per-document deterministic outcome: re-indexing the same source produces the same result. Re-indexation requests are addressed through the Orchestrator (reindex-document, differential-indexation, retry-documents-parsing-error).

  • A per-instance isolation guarantee: vectors from one customer are never shared with, nor comparable across, another customer's instance.

Where vectors land

Depends on the deployment model:

  • SaaS — a managed vector index, one per K-AI instance.

  • Snowflake Native AppVECTOR columns inside the customer's own Snowflake account. Data never leaves Snowflake.

  • On-premise — a bundled vector index running inside the customer's Kubernetes cluster. See On-premise installation.

Next steps

Last updated