Auto Routing: One Model String, Every Provider
Most AI gateways make you pick a model. RoutePlex lets you skip that decision entirely.
Set model: "routeplex-ai" and the router handles everything — analyzing your request, scoring available models, and selecting the best one across OpenAI, Anthropic, and Google. If that model fails, it falls back to the next best option. Your application never sees the retry.
How It Works
Auto routing is a three-stage pipeline that runs in under 5ms:
Stage 1: Analyze
The router classifies your request by examining:
- Query type — code generation, analysis, creative writing, math, conversation, and 12 other categories
- Complexity — token count, specificity signals, multi-step reasoning indicators
- Context requirements — whether the task benefits from large context windows
This classification happens locally using pattern matching — no LLM calls, no latency overhead.
Stage 2: Score
Each available model receives a composite score based on four dimensions:
| Dimension | Weight | Source | |-----------|--------|--------| | Capability | 40% | Static benchmarks + query-type fit | | Cost | 25% | Per-token pricing for the estimated output | | Latency | 20% | Rolling p50 from recent requests | | Reliability | 15% | Error rate over the last 5 minutes |
Weights shift based on your chosen strategy (more on that below).
Stage 3: Route
The highest-scoring model wins. If it returns a 5xx or times out, the router retries with the next model in the ranked list — up to 2 fallback attempts. The entire fallback chain is invisible to your application.
Four Strategies
You can hint which dimension matters most by passing a routing strategy:
from openai import OpenAI
client = OpenAI(
base_url="https://api.routeplex.com/v1",
api_key="rp_..."
)
# Default: balanced across quality, cost, and speed
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Summarize this article"}]
)
# Prioritize quality — routes to the most capable model
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Review this code for security vulnerabilities"}],
extra_body={"routing_strategy": "quality"}
)
# Prioritize cost — routes to the most affordable option
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Classify this text as positive or negative"}],
extra_body={"routing_strategy": "cost"}
)
# Prioritize speed — routes to the lowest-latency model
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Quick: what's 15% of 240?"}],
extra_body={"routing_strategy": "speed"}
)
Strategy Weight Distribution
| Strategy | Capability | Cost | Latency | Reliability | |----------|-----------|------|---------|-------------| | Balanced | 40% | 25% | 20% | 15% | | Quality | 60% | 10% | 15% | 15% | | Cost | 20% | 50% | 15% | 15% | | Speed | 20% | 15% | 50% | 15% |
Works With the OpenAI SDK
Auto routing is fully compatible with the OpenAI SDK. No custom client, no wrapper library — just change your base_url:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.routeplex.com/v1",
apiKey: "rp_your_key_here",
});
const completion = await client.chat.completions.create({
model: "routeplex-ai",
messages: [{ role: "user", content: "Explain how DNS works" }],
stream: true,
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
Streaming, function calling, and all standard parameters work exactly as expected.
Auto Routing + Self-Learning
If you have Self-Learning Routing enabled, auto routing gets smarter over time. The router applies per-model bias based on your historical performance data — models that consistently deliver better results for your workload get a score boost.
After ~100 requests per model, routing decisions are personalized to your specific use case. No configuration required.
When to Use Direct Mode Instead
Auto routing handles the common case well. But there are scenarios where specifying a model directly makes more sense:
- Reproducibility — If you need deterministic model selection for testing or compliance
- Model-specific features — If you're using features unique to a specific model (e.g., vision, extended thinking)
- Cost control — If you want to pin to a specific cheap model regardless of query complexity
# Direct mode — bypass auto routing
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "..."}]
)
The Bottom Line
Auto routing removes the model selection decision from your codebase. You get the best model for each request, automatic failover across providers, and cost optimization — all through a single model string.
One integration. Every model. Zero configuration.



