Models

Coming soon

Three models, one API

Choose the right model for your workload. Switch between them with a single parameter change — no code rewrite needed. All three launch together.

wlfv-v1-flash
Context window128,000
Output speed180 tok/s
First token~0.4s
Input / 1M$0.12
Output / 1M$0.35
  • Lowest latency in the family
  • Ideal for real-time chat and autocomplete
  • 180 tokens/second throughput
Coming soon View pricing →
wlfv-v1-code
Context window200,000
Output speed95 tok/s
First token~0.6s
Input / 1M$0.18
Output / 1M$0.45
  • Tuned for code generation and refactoring
  • 200K context fits large codebases
  • Strongest on HumanEval and code benchmarks
Coming soon View pricing →
wlfv-v1-pro
Context window262,144
Output speed55 tok/s
First token~0.9s
Input / 1M$0.40
Output / 1M$0.90
  • Highest quality across all benchmarks
  • 262K context for long documents and logs
  • Best for complex reasoning and analysis
Coming soon View pricing →

Pick a model

Which model should I use?

Start with the workload, not the name. Match your latency, context, and quality needs to the right tier.

Choose Flash when

  • Latency must stay under a second
  • You're handling high request volume
  • Each request is short and conversational
  • Cost per call is the priority

Choose Code when

  • You're generating, reviewing, or refactoring code
  • Context spans multiple files or a whole repo
  • You need structured, correct output
  • Throughput matters more than peak speed

Choose Pro when

  • Tasks need multi-step reasoning
  • You're working with very long documents
  • Agents need to call tools and reflect
  • Quality outweighs latency and cost

Capabilities

Shared across every model

The features below ship with all three models at launch. No tier gates them behind a higher price.

Tool & function calling

Define tools as JSON schemas and let any model call them. Parallel calls, structured arguments, and typed returns.

Structured JSON output

Force responses to a schema with response_format. Guaranteed valid JSON that parses on the first try.

Streaming

Stream tokens over SSE as they're generated. First-byte latency stays low across the whole family.

System & multi-turn

Full system prompts, role-tagged turns, and conversation history. The API mirrors the OpenAI chat format.

Vision (Pro)

Pro accepts images alongside text for charts, screenshots, and documents. Flash and Code are text-only at launch.

Batch & async

Submit large batch jobs at a 50% discount with up to 24h turnaround. Same models, same quality, lower rate.

Coming soon

All three models launch together.

Flash, Code, and Pro ship at the same time. Join the waitlist to get early API access before public availability.