> For the complete documentation index, see [llms.txt](https://k-ai.gitbook.io/knowledge-ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://k-ai.gitbook.io/knowledge-ai/the-k-ai-platform/architecture.md).

# 5-layer architecture

The K-AI Platform is organised into five conceptual layers. Each layer is neutral towards implementation — it can be served by one product or composed across several tools. K-AI implements all five end-to-end.

```mermaid
flowchart TB
    SRC["<b>1. Sources</b><br/>Existing document estate<br/>SharePoint · Confluence · Notion · Google Drive · Box<br/>ECM legacy · NAS · custom"]
    ING["<b>2. Ingestion & Indexing</b><br/>Connectors · Parsing & OCR · LLM indexing<br/>Active Metadata · Document Lineage"]
    SEM["<b>3. Semantic Document Layer</b><br/>Neural Semantic Graph (K-AI proprietary)<br/>Unified semantic graph · contradiction detection<br/>missing-subject clusters · contextual understanding"]
    GOV["<b>4. Governance & Quality</b><br/>Document Catalog · K-AI Audit<br/>Missing subjects · Mandatory questions · Observability"]
    EXP["<b>5. Exposure & Consumption</b><br/>K-AI MCP · REST API · Document Contracts<br/>ACL preserved · lineage traced"]
    CSM["Consumers<br/>Humans · AI agents · business apps"]

    SRC --> ING --> SEM --> GOV --> EXP --> CSM
```

The layers are linear by design: each one operates on the output of the previous one, and each one has its own ownership pattern, its own SLAs, and its own failure modes. The rest of this document walks them top to bottom.

## Layer 1 — Sources

The Sources layer is the existing document estate. SharePoint Online and on-premise, Confluence Cloud, Notion, Google Drive, Box, ECM legacy systems (Documentum, OpenText), NAS shares (SMB, NFS), business databases exposed via API, mail archives, custom repositories. In a typical large group, eight to twelve distinct sources coexist.

The platform's stance towards this layer is non-negotiable: **it is consumed, not replaced.** Centralising the document estate into a single repository is not a prerequisite — and in practice it never happens. The platform must federate, preserve native ACLs, and accept that each source has its own lifecycle, governance, and operational owner.

In K-AI's implementation, this layer is addressed through the [Sources & ingestion section](/knowledge-ai/sources-and-ingestion/sources-ingestion.md), which documents the connector matrix and the per-source ingestion contract.

## Layer 2 — Ingestion & Indexing

The Ingestion & Indexing layer is what makes the source estate addressable as a unified surface. It is a technical pipeline with four classes of component:

* **Connectors** — incremental ingestion against each source's native API, with ACL preservation at the document level. Connectors are stateful: they track watermarks per source and replay only deltas.
* **Parsing & OCR** — structured extraction across every format the source layer produces (PDF, Office, scanned images, embedded tables, multi-column layouts, email with attachments).
* **LLM-based indexing** — embedding generation, entity and concept extraction, hierarchical structuring.
* **Active Metadata + Document Lineage** — automatic enrichment (subject, mirrored ACLs, freshness, dependencies) and full provenance (origin, transformations, versions).

This layer is operated by the **Document Engineer (DocOps)** — a practice typically shared with the data platform team, not a dedicated role. Its output is a normalised, governed, addressable representation of the source estate.

In K-AI's implementation, this layer is described in [Sources & ingestion — Indexation pipeline](/knowledge-ai/sources-and-ingestion/indexation-pipeline.md) and exposed through the Instance API.

## Layer 3 — Semantic Document Layer

The Semantic Document Layer is the distinctive layer of the platform. It is the layer that separates a Document Knowledge Platform from a Data Catalog (metadata-oriented) and from a classical retrieval stack (vector-oriented).

This layer produces a unified semantic representation of the document estate — a graph of documents, concepts, subjects, actors, and dependencies — and uses that representation to detect contradictions, identify missing subjects, and ground query understanding in explicit relations rather than vector proximity alone.

This is the layer that makes "clean" operationally meaningful: cleanliness becomes the absence of unresolved contradictions, the coverage of expected subjects, the freshness of the active surface — measured at the semantic level, not at the storage level.

In K-AI's implementation, this layer is served by the [Neural Semantic Graph](/knowledge-ai/the-k-ai-platform/neural-semantic-graph.md), K-AI's proprietary semantic representation.

## Layer 4 — Governance & Quality

The Governance & Quality layer is where the platform becomes a discipline rather than a tool. It exposes the semantic findings of Layer 3 to the role model (Owner, Authority, Steward, Producer — see [Roles model](/knowledge-ai/the-k-ai-platform/roles.md)) and provides the surfaces through which those roles act.

Five canonical components live here:

* **Document Catalog** — living inventory of Document Products: owners, ACLs, lineage, quality scores, dependencies. In K-AI today, the Document Catalog is materialised through the [Document Owner](/knowledge-ai/the-k-ai-platform/roles.md#document-owner) view in the K-AI Audit web app, plus the metadata exposed by the [Instance API Documents endpoints](/knowledge-ai/sources-and-ingestion/instance-api/documents.md). A dedicated Catalog UI is on the roadmap.
* **Audit & remediation** — continuous detection of conflicts, semantic duplicates, obsolescence, with recommended corrections.
* **Missing subjects** — expected subjects not yet covered by the estate, surfaced by triangulating emerging queries and current coverage.
* **Mandatory questions** — business arbitrations escalated to subject-matter experts for explicit resolution (conflicts the platform cannot resolve automatically).
* **Document Observability** — per-Document-Product quality KPIs, Steward dashboards, quality trajectories over time.

This layer is where the editorial work happens. It is also where the largest behavioural change sits: the Steward's daily routine, the Producer's occasional contribution, the Owner's arbitration cadence.

In K-AI's implementation, this layer is served by [K-AI Audit](/knowledge-ai/k-ai-audit/audit.md) (Steward + Producer surface) and by the Document Catalog (Owner + Authority surface). In K-AI today, the Document Catalog is materialised through the [Document Owner](/knowledge-ai/the-k-ai-platform/roles.md#document-owner) view in the K-AI Audit web app, plus the metadata exposed by the [Instance API Documents endpoints](/knowledge-ai/sources-and-ingestion/instance-api/documents.md). A dedicated Catalog UI is on the roadmap.

## Layer 5 — Exposure & Consumption

The Exposure & Consumption layer is the surface through which the cleaned, governed estate becomes usable by humans and AI agents. Four canonical modes:

* **K-AI MCP** — Model Context Protocol exposure for AI agents (Claude, ChatGPT, internal copilots). Mirrored ACLs, traced lineage, sourcing on every response.
* **REST API** — for business applications (CRM, ERP, support tools) that integrate the document estate into their own workflows.
* **Document Contracts** — formal, machine-readable engagements between Owners and Consumers: SLA, freshness, quality bands, ACL semantics, supported formats.
* **Document Discovery** — natural-language access for human Document Consumers; see [Document Discovery in the glossary](/knowledge-ai/reference/glossary.md).

The non-negotiable property of this layer is that ACLs are preserved and lineage is traced. A Consumer never sees a document they should not see, and every response can be traced back to the source documents that produced it.

In K-AI's implementation, this layer is served by [K-AI MCP](/knowledge-ai/k-ai-mcp/mcp.md) for AI agents, by the Retrieval API for custom integrations, and by the [Instance API](/knowledge-ai/sources-and-ingestion/instance-api/instance-api.md) for machine-to-machine pipelines that ingest the platform's own output.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://k-ai.gitbook.io/knowledge-ai/the-k-ai-platform/architecture.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.