Models — WLFV AI

Models

Coming soon

Three models, one API

Choose the right model for your workload. Switch between them with a single parameter change — no code rewrite needed. All three launch together.

wlfv-v1-flash

Context window128,000

Output speed180 tok/s

First token~0.4s

Input / 1M$0.12

Output / 1M$0.35

Lowest latency in the family
Ideal for real-time chat and autocomplete
180 tokens/second throughput

Coming soon View pricing →

wlfv-v1-code

Context window200,000

Output speed95 tok/s

First token~0.6s

Input / 1M$0.18

Output / 1M$0.45

Tuned for code generation and refactoring
200K context fits large codebases
Strongest on HumanEval and code benchmarks

Coming soon View pricing →

wlfv-v1-pro

Context window262,144

Output speed55 tok/s

First token~0.9s

Input / 1M$0.40

Output / 1M$0.90

Highest quality across all benchmarks
262K context for long documents and logs
Best for complex reasoning and analysis

Coming soon View pricing →

Pick a model

Which model should I use?

Start with the workload, not the name. Match your latency, context, and quality needs to the right tier.

Choose Flash when

Latency must stay under a second
You're handling high request volume
Each request is short and conversational
Cost per call is the priority

Choose Code when

You're generating, reviewing, or refactoring code
Context spans multiple files or a whole repo
You need structured, correct output
Throughput matters more than peak speed

Choose Pro when

Tasks need multi-step reasoning
You're working with very long documents
Agents need to call tools and reflect
Quality outweighs latency and cost

Capabilities

Shared across every model

The features below ship with all three models at launch. No tier gates them behind a higher price.

Tool & function calling

Define tools as JSON schemas and let any model call them. Parallel calls, structured arguments, and typed returns.

Structured JSON output

Force responses to a schema with response_format. Guaranteed valid JSON that parses on the first try.

Streaming

Stream tokens over SSE as they're generated. First-byte latency stays low across the whole family.

System & multi-turn

Full system prompts, role-tagged turns, and conversation history. The API mirrors the OpenAI chat format.

Vision (Pro)

Pro accepts images alongside text for charts, screenshots, and documents. Flash and Code are text-only at launch.

Batch & async

Submit large batch jobs at a 50% discount with up to 24h turnaround. Same models, same quality, lower rate.

Coming soon

All three models launch together.

Flash, Code, and Pro ship at the same time. Join the waitlist to get early API access before public availability.

Join the waitlist See benchmarks