Smart Routing picks the optimal model for every request. When you enable it, RoutePlex inspects the prompt and selects the best model for the task — you don't pick a model ID.
Enable It
Native API — set mode: "routeplex-ai":
{
"mode": "routeplex-ai",
"messages": [{"role": "user", "content": "Your prompt"}]
}OpenAI-compatible API — set model: "routeplex-ai":
from openai import OpenAI
client = OpenAI(
base_url="https://api.routeplex.com/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Explain DNS"}],
)Official SDKs — omit the model parameter:
response = client.chat("Your prompt")const res = await client.chat("Your prompt");Strategies
Smart Routing supports five strategies that steer the selection toward different priorities:
| Strategy | Optimizes for |
|---|---|
auto (default) | Adaptive — selection tuned to the specific prompt |
balanced | Well-rounded trade-off across cost, quality, and speed |
cost | Minimum cost per request |
quality | Best available model for the task |
speed | Lowest response latency |
auto is the default and recommended for most workloads — it adapts to each request rather than applying a fixed bias. Use the other strategies when you want consistent behavior for a specific use case (e.g. cost for high-volume classification, speed for chat UIs).
Passing a Strategy
Native API — top-level strategy field:
{
"mode": "routeplex-ai",
"strategy": "quality",
"messages": [{"role": "user", "content": "Review this code for bugs"}]
}OpenAI-compatible — X-RoutePlex-Strategy header:
response = client.chat.completions.create(
model="routeplex-ai",
messages=[{"role": "user", "content": "Review this code"}],
extra_headers={"X-RoutePlex-Strategy": "quality"},
)SDKs — strategy parameter:
response = client.chat("Analyze this dataset", strategy="quality")await client.chat("Classify sentiment", { strategy: "cost" });Fallback Behavior
When the selected model fails (5xx, timeout, or provider unavailable), RoutePlex automatically tries the next-best model from the candidate pool. Fallback is transparent — your application receives one response either way.
Non-retriable errors (invalid API key, invalid model, content policy) fail fast without fallback.
Response Fields
Every chat response includes which model handled the request so you can audit decisions in your logs:
{
"success": true,
"data": {
"id": "req_abc123",
"output": "Hello! How can I help?",
"model_used": "gpt-4.1-nano",
"usage": {
"input_tokens": 10,
"output_tokens": 8,
"total_tokens": 18,
"cost_usd": 0.000027
}
},
"meta": {
"request_id": "req_abc123",
"timestamp": "2026-04-24T10:00:00Z"
}
}Opt Out Per-Request
Pin a specific model for a request by using manual mode. In the native API, set mode: "manual" with a model:
{
"mode": "manual",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}With the SDKs, pass the model parameter directly:
response = client.chat("Hello!", model="gpt-4o")See Routing Modes for the auto-vs-manual comparison and Self-Learning Routing for how auto-selection adapts to your workload over time.