Choose Flash when
- Latency must stay under a second
- You're handling high request volume
- Each request is short and conversational
- Cost per call is the priority
Models
Choose the right model for your workload. Switch between them with a single parameter change — no code rewrite needed. All three launch together.
Pick a model
Start with the workload, not the name. Match your latency, context, and quality needs to the right tier.
Capabilities
The features below ship with all three models at launch. No tier gates them behind a higher price.
Define tools as JSON schemas and let any model call them. Parallel calls, structured arguments, and typed returns.
Force responses to a schema with response_format. Guaranteed valid JSON that parses on the first try.
Stream tokens over SSE as they're generated. First-byte latency stays low across the whole family.
Full system prompts, role-tagged turns, and conversation history. The API mirrors the OpenAI chat format.
Pro accepts images alongside text for charts, screenshots, and documents. Flash and Code are text-only at launch.
Submit large batch jobs at a 50% discount with up to 24h turnaround. Same models, same quality, lower rate.
Flash, Code, and Pro ship at the same time. Join the waitlist to get early API access before public availability.