Deployment models
Operational depth on the three supported deployment modes. For the commercial overview, see Platform — Deployment models.
All three modes run the same K-AI Platform application code. What changes is the substrate: where Kubernetes runs, where the database lives, where documents are stored, and who owns operations.
SaaS (multi-tenant)
K-AI-operated, hosted on managed Kubernetes in France.
Cluster
Managed Kubernetes (France)
Database
Managed PostgreSQL with dedicated schemas per platform module
Search & vectors
Managed vector index
File storage
Managed object storage
Ingress
Standard ingress controller with automated TLS
DNS
kai-studio.ai and subdomains
LLM endpoint
Routed to the K-AI LLM service
Per-customer isolation
One K-AI Instance = one dedicated pod + one database schema + one vector index + one storage bucket. Compute, vector store, and storage are physically separate per instance; noisy-neighbour effects are isolated at the pod level.
The shared management plane holds no document content — only metadata, identity, access rules, and cost events.
Operational ownership
K-AI operates the cluster, runs upgrades, performs backups, and monitors the platform. The customer's only operational responsibility is connector credentials and access management.
On-premise (Kubernetes in customer cluster)
Same chart, same images, same isolation model — deployed in a Kubernetes cluster the customer owns.
Distribution is the K-AI Helm chart, with an optional companion chart for customers without an external LLM endpoint. Both ship as offline bundles for air-gapped environments.
Customer-supplied dependencies:
Kubernetes cluster (1.27+) with cluster-admin access during install
OCI-compatible container registry
Ingress controller and TLS (managed certificates or customer PKI)
PostgreSQL (managed or self-hosted) and an S3-compatible object store
IAM federation for SSO (Azure AD, Okta, Ping, or any OIDC-compliant IdP)
Observability stack (Datadog, Grafana, Splunk, …) — K-AI exports Prometheus metrics; the customer scrapes
Hardware sizing scales with document estate size and indexation throughput; K-AI provides sizing guidance during onboarding.
Air-gapped support
The on-premise mode is designed for environments without internet egress. K-AI produces an offline bundle on a connected machine; the customer transfers it (SFTP, disk, …) and loads it into their private registry. Subsequent upgrades ship as smaller delta bundles.
See On-premise installation for the install walkthrough.
Snowflake Native App (SPCS)
Deployed as Snowflake Container Services inside the customer's own Snowflake account. The platform runs inside Snowflake's container runtime — no external Kubernetes cluster involved.
Application services
Snowflake Compute Pools (SPCS)
Documents
Snowflake Stages
Embeddings
VECTOR columns in dedicated schemas
Management metadata
Dedicated Snowflake schemas inside the customer account
Authentication
Same auth service, deployed as a SPCS service
Data residency is defined by the customer's Snowflake region. Operations responsibility is shared: Snowflake operates the substrate, K-AI provides the application updates, the customer manages account-level governance.
The Snowflake mode inherits Snowflake's resource model and operational SLA. Scaling is driven by Snowflake compute pool sizing.
LLM serving
K-AI's LLM serving runs on a separate, dedicated platform independent of the SaaS cluster and of any customer on-premise cluster. The serving topology — model selection, sizing, autoscaling, and fallback strategy — is operated by K-AI and out of scope for customer integration.
For on-premise deployments without local GPU capacity, route LLM requests to the K-AI SaaS LLM endpoint or to your own LLM endpoint. A companion chart is available for customers who want to self-host completion and embedding inside their cluster.
Last updated