AI Models

235 models Free & Paid Cập nhật: 16 phút trước

Elephant

Elephant Alpha is a 100B-parameter text model focused on intelligence efficiency, delivering strong performance while minimizing token usage. It supports a 256K context window with up to 32K output tokens,...

by openrouter |Th4 2026 |262K context |Miễn phí input |Miễn phí output

262K tokens ⓘ

Anthropic: Claude Opus 4.6 (Fast)

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

by anthropic |Th4 2026 |1M context |$30.00/M input |$150.00/M output

1M tokens ⓘ

Z.ai: GLM 5.1

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

by z-ai |Th4 2026 |203K context |$0.9500/M input |$3.15/M output

203K tokens ⓘ

Google: Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

by google |Th4 2026 |262K context |$0.0800/M input |$0.3500/M output

262K tokens ⓘ

Google: Gemma 4 26B A4B (free)

by google |Th4 2026 |262K context |Miễn phí input |Miễn phí output

262K tokens ⓘ

Google: Gemma 4 31B (free)

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

by google |Th4 2026 |262K context |Miễn phí input |Miễn phí output

262K tokens ⓘ

Google: Gemma 4 31B

by google |Th4 2026 |262K context |$0.1300/M input |$0.3800/M output

262K tokens ⓘ

Qwen: Qwen3.6 Plus

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

by qwen |Th4 2026 |1M context |$0.3250/M input |$1.95/M output

1M tokens ⓘ

Z.ai: GLM 5V Turbo

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

by z-ai |Th4 2026 |203K context |$1.20/M input |$4.00/M output

203K tokens ⓘ

Arcee AI: Trinity Large Thinking

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7

by arcee-ai |Th4 2026 |262K context |$0.2200/M input |$0.8500/M output

262K tokens ⓘ

xAI: Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...

by x-ai |Th3 2026 |2M context |$2.00/M input |$6.00/M output

2M tokens ⓘ

xAI: Grok 4.20

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

by x-ai |Th3 2026 |2M context |$2.00/M input |$6.00/M output

2M tokens ⓘ

Google: Lyria 3 Pro Preview

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

by google |Th3 2026 |1M context |Miễn phí input |Miễn phí output

1M tokens ⓘ

Google: Lyria 3 Clip Preview

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

by google |Th3 2026 |1M context |Miễn phí input |Miễn phí output

1M tokens ⓘ

Kwaipilot: KAT-Coder-Pro V2

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions,...

by kwaipilot |Th3 2026 |256K context |$0.3000/M input |$1.20/M output

256K tokens ⓘ

Reka Edge

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...

by rekaai |Th3 2026 |16K context |$0.1000/M input |$0.1000/M output

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

by xiaomi |Th3 2026 |262K context |$0.4000/M input |$2.00/M output

262K tokens ⓘ

Xiaomi: MiMo-V2-Pro

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like...

by xiaomi |Th3 2026 |1M context |$1.00/M input |$3.00/M output

1M tokens ⓘ

MiniMax: MiniMax M2.7

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

by minimax |Th3 2026 |197K context |$0.3000/M input |$1.20/M output

197K tokens ⓘ

OpenAI: GPT-5.4 Nano

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...

by openai |Th3 2026 |400K context |$0.2000/M input |$1.25/M output

400K tokens ⓘ

OpenAI: GPT-5.4 Mini

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...

by openai |Th3 2026 |400K context |$0.7500/M input |$4.50/M output

400K tokens ⓘ

Mistral: Mistral Small 4

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

by mistralai |Th3 2026 |262K context |$0.1500/M input |$0.6000/M output

262K tokens ⓘ

Z.ai: GLM 5 Turbo

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows...

by z-ai |Th3 2026 |203K context |$1.20/M input |$4.00/M output

203K tokens ⓘ

NVIDIA: Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

by nvidia |Th3 2026 |262K context |Miễn phí input |Miễn phí output

262K tokens ⓘ

NVIDIA: Nemotron 3 Super

by nvidia |Th3 2026 |262K context |$0.1000/M input |$0.5000/M output

262K tokens ⓘ

ByteDance Seed: Seed-2.0-Lite

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

by bytedance-seed |Th3 2026 |262K context |$0.2500/M input |$2.00/M output

262K tokens ⓘ

Qwen: Qwen3.5-9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

by qwen |Th3 2026 |256K context |$0.0500/M input |$0.1500/M output

256K tokens ⓘ

OpenAI: GPT-5.4 Pro

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

by openai |Th3 2026 |1.1M context |$30.00/M input |$180.00/M output

1.1M tokens ⓘ

OpenAI: GPT-5.4

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

by openai |Th3 2026 |1.1M context |$2.50/M input |$15.00/M output

1.1M tokens ⓘ

Inception: Mercury 2

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

by inception |Th3 2026 |128K context |$0.2500/M input |$0.7500/M output

128K tokens ⓘ

OpenAI: GPT-5.3 Chat

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

by openai |Th3 2026 |128K context |$1.75/M input |$14.00/M output

128K tokens ⓘ

Google: Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

by google |Th3 2026 |1M context |$0.2500/M input |$1.50/M output

1M tokens ⓘ

ByteDance Seed: Seed-2.0-Mini

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

by bytedance-seed |Th2 2026 |262K context |$0.1000/M input |$0.4000/M output

262K tokens ⓘ

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

by google |Th2 2026 |66K context |$0.5000/M input |$3.00/M output

66K tokens ⓘ

Qwen: Qwen3.5-35B-A3B

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

by qwen |Th2 2026 |262K context |$0.1625/M input |$1.30/M output

262K tokens ⓘ

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

by qwen |Th2 2026 |262K context |$0.1950/M input |$1.56/M output

262K tokens ⓘ

Qwen: Qwen3.5-122B-A10B

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

by qwen |Th2 2026 |262K context |$0.2600/M input |$2.08/M output

262K tokens ⓘ

Qwen: Qwen3.5-Flash

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

by qwen |Th2 2026 |1M context |$0.0650/M input |$0.2600/M output

1M tokens ⓘ

LiquidAI: LFM2-24B-A2B

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...

by liquid |Th2 2026 |33K context |$0.0300/M input |$0.1200/M output

Google: Gemini 3.1 Pro Preview Custom Tools

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

by google |Th2 2026 |1M context |$2.00/M input |$12.00/M output

1M tokens ⓘ

OpenAI: GPT-5.3-Codex

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

by openai |Th2 2026 |400K context |$1.75/M input |$14.00/M output

400K tokens ⓘ

AionLabs: Aion-2.0

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging....

by aion-labs |Th2 2026 |131K context |$0.8000/M input |$1.60/M output

131K tokens ⓘ

Google: Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

by google |Th2 2026 |1M context |$2.00/M input |$12.00/M output

1M tokens ⓘ

Anthropic: Claude Sonnet 4.6

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

by anthropic |Th2 2026 |1M context |$3.00/M input |$15.00/M output

1M tokens ⓘ

Qwen: Qwen3.5 Plus 2026-02-15

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

by qwen |Th2 2026 |1M context |$0.2600/M input |$1.56/M output

1M tokens ⓘ

Qwen: Qwen3.5 397B A17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

by qwen |Th2 2026 |262K context |$0.3900/M input |$2.34/M output

262K tokens ⓘ

MiniMax: MiniMax M2.5 (free)

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

by minimax |Th2 2026 |197K context |Miễn phí input |Miễn phí output

197K tokens ⓘ

MiniMax: MiniMax M2.5

by minimax |Th2 2026 |197K context |$0.1180/M input |$0.9900/M output

197K tokens ⓘ

Z.ai: GLM 5

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...

by z-ai |Th2 2026 |80K context |$0.7200/M input |$2.30/M output

80K tokens ⓘ

Qwen: Qwen3 Max Thinking

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...

by qwen |Th2 2026 |262K context |$0.7800/M input |$3.90/M output

262K tokens ⓘ

AI Models

Tài khoản

🔑 Lấy lại mật khẩu