For the complete documentation index, see llms.txt. This page is also available as Markdown.

Runbooks

Self-service guidance for the five operational situations customers see most often. Each entry describes the symptoms a customer can observe, what they can check themselves, and when to involve K-AI support.

For ongoing observability, see Monitoring & observability. For account-specific incidents, contact your K-AI support representative.

Indexation stuck on a document

What you see. A document remains in ON_CONTENT_EXTRACT for an unusually long time when listed via GET /documents/list?state=ON_CONTENT_EXTRACT.

What to check. Open the document at its source — is it password-protected, corrupted, or unusually large (image-heavy scan)? Most stuck documents are explained by a file the parser cannot read.

What to do. Re-queue the document via the Orchestrator endpoint:

curl -X POST "https://api.kai-studio.ai/api/orchestrator/relaunch-document" \
  -H "instance-id: $INSTANCE_ID" \
  -H "api-key: $API_KEY" \
  -d '{"document_id": "<id>"}'

If the same document fails repeatedly or multiple documents are stuck at once, contact support.

OAuth tokens need revoking

What you see. A user account is compromised, an employee leaves, or a security review requires invalidating active sessions.

What to do. Revoke all active sessions for the user through the admin endpoint:

curl -X POST "https://auth-api.kai-studio.ai/auth/admin/revoke-user-tokens" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"user_id": "<user_id>"}'

Active access tokens expire within 15 minutes; refresh attempts after revocation are rejected immediately. See Authentication — OAuth 2.1 for token lifetimes.

For incident response involving multiple users or cross-instance scope, contact support.

Connector sync failure

What you see. A source (SharePoint, Confluence, Notion, Google Drive, Box, NAS, web crawler, custom HTTP) shows new content on the source side, but those documents do not appear in K-AI after the expected sync window.

What to check.

  • Source credentials. The most common cause is an expired or revoked OAuth token at the source. Reconnect the source from the connector configuration UI or via PATCH /sources/<id>.

  • Source rate limits. Some sources (Confluence Cloud, Notion) rate-limit aggressively. Reduce sync frequency or batch size.

  • Network reachability. For on-premise deployments, verify the source hostname resolves and is reachable from inside your cluster.

If credentials and reachability are fine and documents still do not flow after 24 hours, contact support — the connector itself may need an update.

LLM service unavailable or degraded

What you see. LLM-backed features (Audit AI crews, retrieval answer summarisation) return errors or take longer than usual. Cost dashboards may show a spike on the LLM cost type.

What to do. In most cases, no action is required — K-AI's LLM service has tiered failover and recovers automatically. If degradation persists, contact support so the platform team can investigate capacity or routing issues.

For on-premise deployments configured to route to an external LLM endpoint, also check the endpoint's own status page.

Cost spike on an instance

What you see. The PICSOU dashboard shows significantly higher KCU consumption than expected over a 24-hour window.

What to check. Drill into the cost dashboard by cost type — the dominant cost type usually points at the cause:

  • LLM or RETRIEVAL_TASK spike → unusually high MCP or audit usage by a user or agent.

  • FILEPARSER or INDEXING_JOB spike → a bulk re-indexation or a connector flood.

  • SEARCH_INDEX spike → index growth beyond the expected document count.

What to do. If the workload was intentional (planned backfill), no action is required. If it was unexpected, pause the responsible connector (PATCH /sources/<id> with enabled: false) and review with your account team.

See Monitoring & observability — Cost events for the full cost-type list.

Last updated