Universal LLM Gateway

One API key. Any model.

Route requests to Claude, GPT, Gemini, Grok, DeepSeek and more through a single endpoint. Use it from our web chat, Claude Code, Codex, or any OpenAI-compatible client.

4 formats

OpenAI · Anthropic · Gemini · Responses

Auto failover

Retries silently

Get Started Login

terminal

$ export ANTHROPIC_BASE_URL="https://api.kiosai.com"

$ export ANTHROPIC_API_KEY="kios_..."

$ claude # works instantly

One key. Every model.

Drop-in support for all major providers — no vendor lock-in.

Claude

Opus, Sonnet, Haiku

GPT

GPT-5, o1, o3, o4

Gemini

2.5 Pro, Flash

Grok

Grok-4

DeepSeek

V3, R1

Kimi

GLM

4.5

Qwen

3, QwQ

How it works

From zero to first request in 60 seconds

Get your API key

Point your client

Set your base URL to our gateway. Works with Claude Code, Codex, OpenAI SDK, raw curl — anything.

export ANTHROPIC_BASE_URL="https://api.kiosai.com"
export ANTHROPIC_API_KEY="kios_..."

Start prompting

Use any model name you want. We route, retry, and translate behind the scenes. Pay only for what you use.

What we offer

Built for developers who use AI every day

Kios AI handles routing, failover, billing, and observability — so you can ship instead of plumbing.

4 formats

Universal Gateway

Speaks Anthropic Messages, OpenAI Chat Completions, Gemini, and the OpenAI Responses API. Translated internally so any client just works.

Claude Code · Codex

Works with Your CLI

Drop-in for Claude Code, Codex, and any OpenAI-compatible client. Set two environment variables and you're done — no SDK changes.

Generate · Edit

Image Generation & Editing

Generate images with GPT-Image, DALL-E, Imagen, or Flux. Upload a reference and tell the model what to change — works via multipart upload.

99%+ uptime

Auto Failover

Hundreds of upstream sources behind every model. Rate-limited sources auto-recover after 8 hours. Failed sources get pruned automatically.

Live tracking

Transparent Token Quotas

Live progress bar with reset countdown right in your settings. Plan defaults with admin overrides. No surprises, no hidden caps.

50 msgs / chat

Persistent Chat History

Conversations saved across reloads and devices. Images, context, and model selection all persist — pick up exactly where you left off.

SSE native

Streaming First-Class

Server-sent events forwarded with sub-5s keep-alive. Token usage captured mid-stream. No idle timeouts on long generations.

AES-256-GCM

Secure by Default

API keys SHA-256 hashed in storage. Source credentials AES-256-GCM encrypted at rest. Session cookies, never tokens in URLs.

Priority routing

Smart Model Aliases

Configure `claude-sonnet-4` → tries newest version first, falls back to older. One model name, automatic upgrades when new versions ship.

Pricing

Simple, transparent pricing

Pick the plan that fits. Upgrade, downgrade, or cancel anytime.

Starter

For trying things out

$X/mo

X tokens / day
Standard models (Sonnet, GPT-4o)
Web chat interface
Community support
CLI / API access
Image generation
Priority routing

Get Started

Pro

For daily power users

$XX/mo

X tokens / day
All models (Opus, GPT-5, Gemini 2.5)
Web chat + CLI + API
Image generation & editing
Priority routing & failover
Persistent chat history
Email support

Get Started

Enterprise

For teams and heavy usage

Custom

Custom token limits
All models, all formats
Dedicated infra & SLA
Custom model aliases
Admin dashboard access
Bulk source import
Dedicated support channel

Prices are placeholders — final tiers and limits will be announced soon.

FAQ

Got questions?

Everything you need to know about Kios AI.

Kios AI is a universal LLM gateway. Instead of juggling separate API keys and SDKs for Claude, GPT, Gemini, and other providers, you get one endpoint and one key that routes to whichever model you ask for.

Any OpenAI-compatible client works out of the box. We also speak Anthropic Messages (so Claude Code works directly), Google Gemini, and the OpenAI Responses API (Codex). Just point your client's base URL at our gateway.

Contact us to get an account, then generate a key from your settings page. Set two environment variables (base URL + API key) and you're live — whether you use Claude Code, Codex, curl, or our web chat.

Requests return a clear 'Kios AI token limit exceeded' message. Your usage resets automatically based on your plan's schedule — you can see a live countdown and progress bar in settings.

We automatically retry the next available source. Rate-limited sources auto-recover after a cooldown. Permanently broken sources get removed. You never see a dead endpoint.

Yes — text-to-image generation works with GPT-Image, DALL-E, and other supported image models. You can also upload a reference image and ask the model to modify it.

API keys are SHA-256 hashed. Source credentials are AES-256-GCM encrypted at rest. Sessions use httpOnly cookies stored in Valkey. We never log prompt content.

Yes. Our web chat at /chat supports any text or image model, persistent conversations, markdown rendering, and streaming responses. No API key management needed in the browser.

Still have questions?

We're happy to help. Reach out and we'll get back to you.