Qwen: Qwen3 235B A22B
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.
Related Models
Qwen: Qwen3 Coder 480B A35B (exacto)
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model develop...
Qwen: Qwen2.5 VL 32B Instruct
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement lear...
Qwen: Qwen3 8B
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed f...