5-layer architecture
The K-AI Platform is organised into five conceptual layers. Each layer is neutral towards implementation — it can be served by one product or composed across several tools. K-AI implements all five end-to-end.
The layers are linear by design: each one operates on the output of the previous one, and each one has its own ownership pattern, its own SLAs, and its own failure modes. The rest of this document walks them top to bottom.
Layer 1 — Sources
The Sources layer is the existing document estate. SharePoint Online and on-premise, Confluence Cloud, Notion, Google Drive, Box, ECM legacy systems (Documentum, OpenText), NAS shares (SMB, NFS), business databases exposed via API, mail archives, custom repositories. In a typical large group, eight to twelve distinct sources coexist.
The platform's stance towards this layer is non-negotiable: it is consumed, not replaced. Centralising the document estate into a single repository is not a prerequisite — and in practice it never happens. The platform must federate, preserve native ACLs, and accept that each source has its own lifecycle, governance, and operational owner.
In K-AI's implementation, this layer is addressed through the Sources & ingestion section, which documents the connector matrix and the per-source ingestion contract.
Layer 2 — Ingestion & Indexing
The Ingestion & Indexing layer is what makes the source estate addressable as a unified surface. It is a technical pipeline with four classes of component:
Connectors — incremental ingestion against each source's native API, with ACL preservation at the document level. Connectors are stateful: they track watermarks per source and replay only deltas.
Parsing & OCR — structured extraction across every format the source layer produces (PDF, Office, scanned images, embedded tables, multi-column layouts, email with attachments).
LLM-based indexing — embedding generation, entity and concept extraction, hierarchical structuring.
Active Metadata + Document Lineage — automatic enrichment (subject, mirrored ACLs, freshness, dependencies) and full provenance (origin, transformations, versions).
This layer is operated by the Document Engineer (DocOps) — a practice typically shared with the data platform team, not a dedicated role. Its output is a normalised, governed, addressable representation of the source estate.
In K-AI's implementation, this layer is described in Sources & ingestion — Indexation pipeline and exposed through the Instance API.
Layer 3 — Semantic Document Layer
The Semantic Document Layer is the distinctive layer of the platform. It is the layer that separates a Document Knowledge Platform from a Data Catalog (metadata-oriented) and from a classical retrieval stack (vector-oriented).
This layer produces a unified semantic representation of the document estate — a graph of documents, concepts, subjects, actors, and dependencies — and uses that representation to detect contradictions, identify missing subjects, and ground query understanding in explicit relations rather than vector proximity alone.
This is the layer that makes "clean" operationally meaningful: cleanliness becomes the absence of unresolved contradictions, the coverage of expected subjects, the freshness of the active surface — measured at the semantic level, not at the storage level.
In K-AI's implementation, this layer is served by the Neural Semantic Graph, K-AI's proprietary semantic representation.
Layer 4 — Governance & Quality
The Governance & Quality layer is where the platform becomes a discipline rather than a tool. It exposes the semantic findings of Layer 3 to the role model (Owner, Authority, Steward, Producer — see Roles model) and provides the surfaces through which those roles act.
Five canonical components live here:
Document Catalog — living inventory of Document Products: owners, ACLs, lineage, quality scores, dependencies. In K-AI today, the Document Catalog is materialised through the Document Owner view in the K-AI Audit web app, plus the metadata exposed by the Instance API Documents endpoints. A dedicated Catalog UI is on the roadmap.
Audit & remediation — continuous detection of conflicts, semantic duplicates, obsolescence, with recommended corrections.
Missing subjects — expected subjects not yet covered by the estate, surfaced by triangulating emerging queries and current coverage.
Mandatory questions — business arbitrations escalated to subject-matter experts for explicit resolution (conflicts the platform cannot resolve automatically).
Document Observability — per-Document-Product quality KPIs, Steward dashboards, quality trajectories over time.
This layer is where the editorial work happens. It is also where the largest behavioural change sits: the Steward's daily routine, the Producer's occasional contribution, the Owner's arbitration cadence.
In K-AI's implementation, this layer is served by K-AI Audit (Steward + Producer surface) and by the Document Catalog (Owner + Authority surface). In K-AI today, the Document Catalog is materialised through the Document Owner view in the K-AI Audit web app, plus the metadata exposed by the Instance API Documents endpoints. A dedicated Catalog UI is on the roadmap.
Layer 5 — Exposure & Consumption
The Exposure & Consumption layer is the surface through which the cleaned, governed estate becomes usable by humans and AI agents. Four canonical modes:
K-AI MCP — Model Context Protocol exposure for AI agents (Claude, ChatGPT, internal copilots). Mirrored ACLs, traced lineage, sourcing on every response.
REST API — for business applications (CRM, ERP, support tools) that integrate the document estate into their own workflows.
Document Contracts — formal, machine-readable engagements between Owners and Consumers: SLA, freshness, quality bands, ACL semantics, supported formats.
Document Discovery — natural-language access for human Document Consumers; see Document Discovery in the glossary.
The non-negotiable property of this layer is that ACLs are preserved and lineage is traced. A Consumer never sees a document they should not see, and every response can be traced back to the source documents that produced it.
In K-AI's implementation, this layer is served by K-AI MCP for AI agents, by the Retrieval API for custom integrations, and by the Instance API for machine-to-machine pipelines that ingest the platform's own output.
Last updated