What is a DKP
A Document Knowledge Platform (DKP) is the category K-AI defines and operates within. This page states the problem the category exists to solve, gives the formal definition, names the three capabilities, and draws the boundary against adjacent categories (AI Workplace Assistants, ECM/DMS, Data Catalogs, Unstructured ETL).
The problem
In most large organisations, the business document estate has accumulated without active governance. Procedures contradict each other, multiple versions of the same document coexist with no clear hierarchy, service notes implicitly invalidate entire chapters of reference materials, and obsolete information lingers because no one knows whether it is still in use somewhere. This is invisible document debt: it stays invisible as long as document usage remains artisanal — an experienced employee eventually learns which file is reliable and which to ignore — and becomes critical the moment a team tries to industrialise it through Knowledge Management, a chatbot, an AI assistant, or an autonomous agent. At that point even the most capable platform returns the chaos it is given. Garbage in, garbage out.
Agentic AI removes the human safety net. A human reader tolerates noise — they know which 2019 procedure has been tacitly replaced by a 2024 service note, and they ignore the former. An AI agent does not. It does not know how to recognise that a procedure is obsolete, and it answers confidently from whatever it is given. If the base is dirty, the answer is wrong — and nobody notices. The threshold of acceptable document quality moves with the consumer: humans tolerate noise, agents do not. As agents take on autonomous decisions — recommending an HSE protocol, validating a banking operation, routing a patient — the tolerance for factual error collapses, and base quality becomes a service prerequisite rather than a best practice.
Three 2026 market figures frame the gap:
80–90% of enterprise data is unstructured (RTInsights, 2026).
Only 7% of organisations describe their data as "fully AI-Ready" (Cloudera × Harvard Business Review Analytic Services, March 2026).
Gartner forecasts that 60% of AI projects will be abandoned by end-2026 because of insufficient data quality.
These three figures together describe a market split: most of the enterprise informational asset is unstructured, almost none of it is AI-Ready, and the majority of AI projects fail on the data quality side rather than on the model side. AI competitive advantage no longer hinges on model choice — models converge. It hinges on the quality, governance, and observability of the knowledge base those models are allowed to consume.
Definition
A Document Knowledge Platform turns a dispersed, heterogeneous, ungoverned document estate into a reliable, observable, AI-consumable knowledge asset — for humans and AI agents alike.
A DKP transposes the patterns proven by the structured-data world over the past decade — Data Catalog, Data Mesh, Data Quality, Data Lineage — onto documents, while integrating the specificity of the documentary world: quality is measured in meaning, not in format. A document does not pass quality control because it follows a schema or matches a template; it passes because it is consistent with the rest of the corpus, up-to-date relative to current regulation and internal decisions, and answers a real knowledge need without contradicting its neighbours.
The platform sits between the document sources (SharePoint, Confluence, Notion, Google Drive, ECM legacy, file shares, web) and the consumers (human users via Document Discovery, AI agents via standard protocols), and it owns the quality contract in between.
Three capabilities
Govern. Inventory the document estate as business products with explicit owner, quality standard, lifecycle, and traceability. The document ceases to be a file in a folder and becomes a Document Product — a managed asset with an SLA and a contract.
Clean. Actively and continuously detect contradictions, divergent duplicates, obsolete information, and missing subjects. Recommend corrections, escalate arbitrations to subject-matter experts, close the quality loop. Cleaning is a continuous discipline, not a one-time prep step.
Activate. Expose the cleaned, governed knowledge base to consumption — humans via Document Discovery, AI agents via standard protocols (MCP) — with ACLs preserved at every step, query-level lineage traced, and compliance maintained with the AI Act, GDPR, and sectoral regulations.
What it isn't
Four adjacent categories cover parts of the problem without covering the full cycle. The boundary is not adversarial — these categories are complementary — but it is necessary to state where each stops.
AI Workplace Assistants — Glean, Microsoft 365 Copilot, Notion AI. They consume existing documents in semantic-search mode and assist end users. They do not operate on the quality of the source base: they answer from what is, not from what should be. Their growth reveals, by symmetry, the absence of an upstream layer that guarantees the quality of what they consume.
ECM / DMS — SharePoint, Box, M-Files, DocuWare. They store, organise, control access, and handle document workflows. They ignore the semantic dimension: they do not know that two documents contradict each other, nor that a third has become obsolete. They reason on syntax (folders, metadata, ACLs, workflows), not on meaning.
Data Catalogs — Collibra, Atlan, Alation, Informatica. They set the frame for modern governance. Historically oriented towards structured data, they are progressively extending into the unstructured estate (Collibra acquired Deasy Labs in July 2025 with that objective), but their DNA remains metadata and lineage. Active semantic cleaning of documents is not what they were built for.
Unstructured ETL — Unstructured.io, Informatica, AWS SageMaker. They handle parsing and pipeline mechanics on unstructured payloads. They do not carry the business dimension and they do not carry governance: they prepare data, they do not govern it.
No existing category covers the full govern–clean–activate cycle for unstructured documents end-to-end. That is where the Document Knowledge Platform sits, and it is the perimeter K-AI operates on.
Last updated