AI Models

235 models Free & Paid Cập nhật: 16 phút trước

Elephant Alpha is a 100B-parameter text model focused on intelligence efficiency, delivering strong performance while minimizing token usage. It supports a 256K context window with up to 32K output tokens,...

by |Th4 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

by |Th4 2026 |1M context |$30.00/M input |$150.00/M output
1M tokens

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

by |Th4 2026 |203K context |$0.9500/M input |$3.15/M output
203K tokens

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

by |Th4 2026 |262K context |$0.0800/M input |$0.3500/M output
262K tokens

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

by |Th4 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

by |Th4 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

by |Th4 2026 |262K context |$0.1300/M input |$0.3800/M output
262K tokens

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

by |Th4 2026 |1M context |$0.3250/M input |$1.95/M output
1M tokens

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

by |Th4 2026 |203K context |$1.20/M input |$4.00/M output
203K tokens

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7

by |Th4 2026 |262K context |$0.2200/M input |$0.8500/M output
262K tokens

Grok 4.20 Multi-Agent is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information...

by |Th3 2026 |2M context |$2.00/M input |$6.00/M output
2M tokens

Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...

by |Th3 2026 |2M context |$2.00/M input |$6.00/M output
2M tokens

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz...

by |Th3 2026 |1M context |Miễn phí input |Miễn phí output
1M tokens

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate...

by |Th3 2026 |1M context |Miễn phí input |Miễn phí output
1M tokens

KAT-Coder-Pro V2 is the latest high-performance model in KwaiKAT’s KAT-Coder series, designed for complex enterprise-grade software engineering and SaaS integration. It builds on the agentic coding strengths of earlier versions,...

by |Th3 2026 |256K context |$0.3000/M input |$1.20/M output
256K tokens

Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...

by |Th3 2026 |16K context |$0.1000/M input |$0.1000/M output

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

by |Th3 2026 |262K context |$0.4000/M input |$2.00/M output
262K tokens

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like...

by |Th3 2026 |1M context |$1.00/M input |$3.00/M output
1M tokens

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

by |Th3 2026 |197K context |$0.3000/M input |$1.20/M output
197K tokens

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...

by |Th3 2026 |400K context |$0.2000/M input |$1.25/M output
400K tokens

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...

by |Th3 2026 |400K context |$0.7500/M input |$4.50/M output
400K tokens

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from...

by |Th3 2026 |262K context |$0.1500/M input |$0.6000/M output
262K tokens

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows...

by |Th3 2026 |203K context |$1.20/M input |$4.00/M output
203K tokens

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

by |Th3 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

by |Th3 2026 |262K context |$0.1000/M input |$0.5000/M output
262K tokens

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across...

by |Th3 2026 |262K context |$0.2500/M input |$2.00/M output
262K tokens

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

by |Th3 2026 |256K context |$0.0500/M input |$0.1500/M output
256K tokens

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K...

by |Th3 2026 |1.1M context |$30.00/M input |$180.00/M output
1.1M tokens

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

by |Th3 2026 |1.1M context |$2.50/M input |$15.00/M output
1.1M tokens

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving...

by |Th3 2026 |128K context |$0.2500/M input |$0.7500/M output
128K tokens

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly...

by |Th3 2026 |128K context |$1.75/M input |$14.00/M output
128K tokens

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

by |Th3 2026 |1M context |$0.2500/M input |$1.50/M output
1M tokens

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding,...

by |Th2 2026 |262K context |$0.1000/M input |$0.4000/M output
262K tokens

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

by |Th2 2026 |66K context |$0.5000/M input |$3.00/M output
66K tokens

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

by |Th2 2026 |262K context |$0.1625/M input |$1.30/M output
262K tokens

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

by |Th2 2026 |262K context |$0.1950/M input |$1.56/M output
262K tokens

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

by |Th2 2026 |262K context |$0.2600/M input |$2.08/M output
262K tokens

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...

by |Th2 2026 |1M context |$0.0650/M input |$0.2600/M output
1M tokens

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per...

by |Th2 2026 |33K context |$0.0300/M input |$0.1200/M output

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

by |Th2 2026 |1M context |$2.00/M input |$12.00/M output
1M tokens

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

by |Th2 2026 |400K context |$1.75/M input |$14.00/M output
400K tokens

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging....

by |Th2 2026 |131K context |$0.8000/M input |$1.60/M output
131K tokens

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

by |Th2 2026 |1M context |$2.00/M input |$12.00/M output
1M tokens

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

by |Th2 2026 |1M context |$3.00/M input |$15.00/M output
1M tokens

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...

by |Th2 2026 |1M context |$0.2600/M input |$1.56/M output
1M tokens

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

by |Th2 2026 |262K context |$0.3900/M input |$2.34/M output
262K tokens

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

by |Th2 2026 |197K context |Miễn phí input |Miễn phí output
197K tokens

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

by |Th2 2026 |197K context |$0.1180/M input |$0.9900/M output
197K tokens

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...

by |Th2 2026 |80K context |$0.7200/M input |$2.30/M output
80K tokens

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...

by |Th2 2026 |262K context |$0.7800/M input |$3.90/M output
262K tokens