Smart Routing picks the optimal model for every request. When you enable it, RoutePlex inspects the prompt and selects the best model for the task — you don't pick a model ID.

Enable It

Native API — set mode: "routeplex-ai":

json

{
  "mode": "routeplex-ai",
  "messages": [{"role": "user", "content": "Your prompt"}]
}

OpenAI-compatible API — set model: "routeplex-ai":

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.routeplex.com/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="routeplex-ai",
    messages=[{"role": "user", "content": "Explain DNS"}],
)

Official SDKs — omit the model parameter:

python

response = client.chat("Your prompt")

javascript

const res = await client.chat("Your prompt");

Strategies

Smart Routing supports five strategies that steer the selection toward different priorities:

Strategy	Optimizes for
`auto` (default)	Adaptive — selection tuned to the specific prompt
`balanced`	Well-rounded trade-off across cost, quality, and speed
`cost`	Minimum cost per request
`quality`	Best available model for the task
`speed`	Lowest response latency

auto is the default and recommended for most workloads — it adapts to each request rather than applying a fixed bias. Use the other strategies when you want consistent behavior for a specific use case (e.g. cost for high-volume classification, speed for chat UIs).

Passing a Strategy

Native API — top-level strategy field:

json

{
  "mode": "routeplex-ai",
  "strategy": "quality",
  "messages": [{"role": "user", "content": "Review this code for bugs"}]
}

OpenAI-compatible — X-RoutePlex-Strategy header:

python

response = client.chat.completions.create(
    model="routeplex-ai",
    messages=[{"role": "user", "content": "Review this code"}],
    extra_headers={"X-RoutePlex-Strategy": "quality"},
)

SDKs — strategy parameter:

python

response = client.chat("Analyze this dataset", strategy="quality")

javascript

await client.chat("Classify sentiment", { strategy: "cost" });

Fallback Behavior

When the selected model fails (5xx, timeout, or provider unavailable), RoutePlex automatically tries the next-best model from the candidate pool. Fallback is transparent — your application receives one response either way.

Non-retriable errors (invalid API key, invalid model, content policy) fail fast without fallback.

Response Fields

Every chat response includes which model handled the request so you can audit decisions in your logs:

json

{
  "success": true,
  "data": {
    "id": "req_abc123",
    "output": "Hello! How can I help?",
    "model_used": "gpt-4.1-nano",
    "usage": {
      "input_tokens": 10,
      "output_tokens": 8,
      "total_tokens": 18,
      "cost_usd": 0.000027
    }
  },
  "meta": {
    "request_id": "req_abc123",
    "timestamp": "2026-04-24T10:00:00Z"
  }
}

Opt Out Per-Request

Pin a specific model for a request by using manual mode. In the native API, set mode: "manual" with a model:

json

{
  "mode": "manual",
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello!"}]
}

With the SDKs, pass the model parameter directly:

python

response = client.chat("Hello!", model="gpt-4o")

See Routing Modes for the auto-vs-manual comparison and Self-Learning Routing for how auto-selection adapts to your workload over time.