
MiniMax M3 is an open-weight model by MiniMax for coding, agentic tasks, and multimodal understanding, with a 1M context window powered by MSA architecture.
Autonomous code development
M3 can independently reproduce research papers, running for nearly 12 hours to generate commits and experimental figures.
CUDA kernel optimization
It can optimize compute-intensive operations like FP8 GEMM on NVIDIA Hopper GPUs, achieving significant speedups with zero human intervention.
Long-range agent tasks
The 1M context window enables handling of extended sequences for agentic workflows and long-video understanding.
Automated data pipeline
M3 can autonomously complete the full pipeline of data synthesis, training, evaluation, and iteration for pretrain-only base models.
Multimodal analysis
It parses charts and formulas from papers, integrating textual and visual information for deep understanding.
Long-range coding
The extended context supports complex coding tasks that require maintaining large codebases or logs in a single window.
1M-context MSA architecture
The MiniMax Sparse Attention (MSA) architecture supports up to 1M tokens context window with a guaranteed minimum of 512K tokens, enabling long-range tasks.
Native multimodality
The model is trained from step zero with multimodal data, achieving deep alignment between textual and visual semantic spaces.
Autonomous task decomposition
M3 can break down complex tasks into sub-steps and execute them independently, as demonstrated in paper reproduction and kernel optimization.
Tool invocation
It can make tool calls (e.g., 1,959 tool calls during kernel optimization) to interact with external systems.
Multi-step reasoning
The model performs sequential reasoning across multiple steps, supporting automated workflows.
High benchmark performance
On BrowseComp, M3 scores 83.5, surpassing Opus 4.7 (79.3), indicating strong autonomous browsing and information retrieval.
Long-horizon stability
It can run continuously for extended periods (e.g., 12 hours for paper reproduction, 24 hours for kernel optimization) without human intervention.
Coding and agentic capabilities
M3 achieves world-leading performance on benchmarks spanning software engineering, terminal execution, and more.
MiniMax M3 is an open-weight model by MiniMax for coding, agentic tasks, and multimodal understanding, with a 1M context window powered by MSA architecture.
Category:Large Model Platform
Visit Link:https://www.minimax.io/models/text/m3
Tags:open-weight model、coding AI、multimodal understanding、large context window、agentic tasks