MiniMax M3 launched today, June 1, 2026, and it does something no open-weight model has pulled off before. It scores 59.0% on SWE-Bench Pro, a hair above GPT-5.5’s 58.6%, while costing roughly 12 times less to run. It also ships with a 1-million-token context window and reads images and video out of the box.
So a Chinese startup just shipped a downloadable model that beats OpenAI’s flagship on a real coding benchmark. That’s the headline. The bigger story is why that keeps happening.
What is MiniMax M3?
MiniMax M3 is an open-weight AI model built for coding, agent workflows, and very long inputs. MiniMax, the Shanghai company behind it, is positioning it as one model that does three jobs that usually take three.
The quick facts:
- Released: June 1, 2026. The API is live now. The actual weights drop within about 10 days, alongside a full technical report.
- Context window: up to 1,000,000 tokens, with a guaranteed floor of 512K.
- Modalities: text, image, and video in. It can also operate a desktop computer.
- Pricing: $0.60 per million input tokens, $2.40 per million output. There’s a 50% launch discount for the first 7 days, and subscription plans start at $20/mo.
- License: open-weight. You’ll be able to download it from Hugging Face and self-host.
MiniMax tested M3 by handing it an ICLR 2025 paper and telling it to reproduce the research on its own. It ran for nearly 12 hours, made 18 commits, and produced 23 experimental figures with no human touching the keyboard. That’s not a benchmark number. That’s the model doing a junior researcher’s week of work overnight.
How MiniMax M3 compares to GPT-5.5
Here’s where the “MiniMax M3 vs GPT-5.5” question gets interesting, because the answer depends on which task you care about.
| Benchmark (what it tests) | MiniMax M3 | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|
| SWE-Bench Pro (agentic coding) | 59.0% | 58.6% | 69.2% |
| Terminal-Bench 2.1 (command line) | 66.0% | 72.1% | 74.2% |
| BrowseComp (web browsing) | 83.5% | — | — |
| SVG-Bench (visual code) | 63.7% | — | — |
M3 edges GPT-5.5 on SWE-Bench Pro, the benchmark everyone quotes for real coding work. On Terminal-Bench, GPT-5.5 pulls back ahead. So calling M3 a flat-out winner over GPT-5.5 isn’t honest. It trades blows, and it does it at a fraction of the price.
Claude Opus 4.8 still sits about 10 points clear on coding (69.2% vs 59.0%). If raw coding quality is all you want and budget is no object, the closed models still win. M3’s pitch is different: roughly 90% of the quality for under 10% of the cost, with weights you actually own.
What MSA (sparse attention) actually does
The thing that makes the 1M context affordable is an architecture MiniMax built called MiniMax Sparse Attention, or MSA.
Most models choke on long inputs. Feed a standard model an entire codebase and the cost climbs fast, because attention compares every token to every other token. The longer the input, the worse it gets.
MSA cuts that down. Against the previous M2 generation, MiniMax reports 15.6× faster decoding and 9.7× faster prefill at a million-token context. And it does this without compressing the data, so you don’t lose precision the way some rival approaches do.
In plain terms: you can throw a whole repo, a long video, or a stack of legal docs at M3 and it stays fast and cheap enough to be worth doing. For agent tasks that run for hours and rack up tokens, that economics shift is the whole ballgame.
Why open-weight models keep closing the gap
A year ago, frontier AI meant three closed names: GPT, Claude, Gemini. You rented access and that was that.
Now there’s a second tier that’s catching up fast, and most of it is Chinese and open-weight. MiniMax. DeepSeek. Qwen. They keep shipping models that land within a few points of the closed flagships, then undercut them on price by 10x or more.
M3 is the clearest example yet, because it’s the first open-weight model to combine frontier coding, a million-token window, and native multimodality in one place. Those three capabilities used to be the moat. Now you can download them.
I don’t think the closed labs are in trouble exactly. They still lead on reliability, tooling, and the top end of reasoning quality. But the gap that used to be a canyon is now a few percentage points, and the open side is cheaper and self-hostable. That changes the math for a lot of people.
What MiniMax M3 means for you
If you build things, learn things, or run a small business, this launch matters in concrete ways.
For developers and indie builders. You get a near-frontier coding model you can self-host once the weights land. No premium API bill, no vendor lock-in, and a 1M context that lets your agents reason over a full codebase instead of a few files. For anyone shipping AI agents or automation on a budget, that’s a real unlock.
For students learning to code. A capable coding assistant used to mean a paid subscription. M3 lowers that wall. You can use a strong model to debug, explain code, and build actual projects without the cost that kept a lot of learners out. If you’re teaching yourself programming this year, the tools just got a lot more generous.
For businesses. More competition among model providers means lower costs and less dependence on a single vendor. If your product runs on AI, an open-weight option that’s 12x cheaper is a leveraged in every renewal conversation you’ll ever have. If you’re weighing options, see our roundup of the best free AI coding assistants.
A fair warning: M3 is brand new. Community tooling is thin, the weights aren’t public for another week and a half, and every benchmark above is MiniMax’s own number. Wait for independent testing before you bet a production system on it.
Is OpenAI losing its lead?
Short answer: not yet, but the lead is thinner than it was.
GPT still wins on the things that matter for shipped products: consistency, a mature ecosystem, and the kind of reliability you need when real users depend on it. On Terminal-Bench, GPT-5.5 still beats M3 outright.
What’s changed is that “GPT is clearly ahead” has turned into “GPT is ahead on some things, by a little.” When a free, downloadable model edges your paid flagship on a flagship coding benchmark, the story stops being about one company’s dominance and starts being about whether closed AI can stay worth the premium.
MiniMax M3 FAQ
Is MiniMax M3 free? Not exactly. The API is paid ($0.60/M input tokens, with a 50% discount for the first week), and subscriptions start at $20/mo. But MiniMax will release the open weights within about 10 days, so you’ll be able to download and self-host it at the cost of your own hardware.
Is MiniMax M3 better than GPT-5.5? On SWE-Bench Pro, yes, by a narrow margin (59.0% vs 58.6%). On Terminal-Bench, GPT-5.5 wins (72.1% vs 66.0%). M3’s real advantage is cost: it’s roughly 12x cheaper and open-weight.
When can I download the MiniMax M3 weights? MiniMax says within 10 days of the June 1 launch, so expect them around June 10–11 on Hugging Face and GitHub, with a full technical report.
What is MiniMax Sparse Attention? MSA is the architecture behind M3’s long context. It makes million-token inputs affordable, with 15.6× faster decoding than the previous generation, without compressing the data and losing precision.
Can MiniMax M3 handle images and video? Yes. It takes text, images, and video as input, and it can also operate a desktop computer for agent tasks.
The launch numbers are MiniMax’s own, so the smart next move is to wait for the weights and run M3 against your own workload before trusting the leaderboard. If it holds up, you’ve got a frontier-class coding model you can own. New to this? Start with our guide on building with open-weight AI models