Models & APIs
Frontier models, reasoning models, and API providers.
Frontier Models (2025–2026)
| Model | Context | Price | Key Feature |
|---|---|---|---|
| GPT-5.2 | 400K | 400K | General intelligence, 100% AIME 2025 |
| Claude Opus 4.6 | 1M (beta) | 1M (beta) | Coding, agentic tasks, extended thinking |
| Gemini 3 Pro | 1M | 1M | #1 LMArena (~1500 Elo), multimodal |
| Grok 4.1 | 2M | 2M | #2 LMArena (1483 Elo), low hallucination |
| Mistral Large 3 | 256K | 256K | Best open-weight (675B MoE/41B active), Apache 2.0 |
| DeepSeek-V3.2 | 128K | 128K | Best value (671B MoE/37B active), MIT license |
| Llama 4 Maverick | 1M | 1M | Beats GPT-4o (400B MoE/17B active), open-weight |
Reasoning Models
| Model | Key Feature |
|---|---|
| OpenAI o3 / o3-pro | 87.7% GPQA Diamond. Native tool use. |
| OpenAI o4-mini | Best AIME at its cost class with visual reasoning. |
| DeepSeek-R1 / R1-0528 | Open-weight, RL-trained. 87.5% on AIME 2025. MIT license. |
| QwQ (Qwen with Questions) | 32B reasoning model. Apache 2.0. Comparable to R1. |
| Gemini 2.5 Pro/Flash (Thinking) | Built-in reasoning with configurable thinking budget. |
| Claude Extended Thinking | Hybrid mode with visible chain-of-thought and tool use. |
| Phi-4 Reasoning / Plus | 14B reasoning models rivaling much larger models. Open-weight. |
| GPT-OSS-120B | OpenAI's open-weight with CoT. Near-parity with o4-mini. Apache 2.0. |
Notable Open-Source Models
| Model | Key Feature |
|---|---|
| Qwen3-235B-A22B | Alibaba |
| Gemma 3 | |
| OLMo 2/3 | Allen AI |
| SmolLM3-3B | Hugging Face |
| Kimi K2 | Moonshot AI |
| Llama 4 Scout | Meta |
Code-Specialized Models
| Model | Key Feature |
|---|---|
| Qwen3-Coder (480B-A35B) | 69.6% SWE-bench — milestone for open-source coding. 256K context. Apache 2.0. |
| Devstral 2 (123B) | 72.2% SWE-bench Verified. 7x more cost-efficient than Claude Sonnet. |
| Codestral 25.01 | Mistral's code model. 80+ languages. Fill-in-the-Middle support. |
| DeepSeek-Coder-V2 | 236B MoE / 21B active. 338 programming languages. |
| Qwen 2.5-Coder | 7B/32B. 92 programming languages. 88.4% HumanEval. Apache 2.0. |
Foundational Models (Historical Reference)
| Model | Context | Significance |
|---|---|---|
| GLM-130B | Tsinghua | Open bilingual English/Chinese LLM (2023) |
| Falcon 180B | TII | Large open generative model (2023) |
| Mixtral 8x7B | Mistral AI | Pioneered MoE architecture for open models (2023) |
| GPT-NeoX-20B | EleutherAI | Early open autoregressive LLM |
| GPT-J-6B | EleutherAI | Early open causal language model |