Recent posts
Modal の大ファン(そして隣人)です。一緒に仕事をするのも素晴らしいグループだと思います。
原文を表示 (en)
Big fan (and neighbor) of Modal. Seems like a great group to work with as well.

Today we're announcing our Series C funding: $355M at a $4.65B valuation, led by some great investors @generalcatalyst and @Redpoint. We've had insane growth in the last year, but we're still very early. So proud of the team and what we have built so far!
Talk: Training Composer https://www.youtube.com/watch?v=uTgqYeVxy2c Cursorでモデルを構築する際に使用している手法の概要。
原文を表示 (en)
Talk: Training Composer https://www.youtube.com/watch?v=uTgqYeVxy2c Overview of the methods that we use at Cursor to build our model.
あ、つまり賢くて、安くて、速いってわけね。
原文を表示 (en)
Huh, so you’re saying it’s smart, cheap, and fast.

Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases @cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07). Key results for Composer 2.5 in Cursor CLI: ➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82 ➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code ➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m) ➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens Model details: ➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning ➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor) ➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available Congratulations @cursor_ai and @mntruell on the impressive release!

Composer でテキストフィードバック / OPSD に取り組んでました。本当に興味深い領域で、まだまだ探索する余地がたくさんあります。
原文を表示 (en)
Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.


We improved Composer by scaling training, generating more complex RL environments, and introducing new learning methods. For example, we use text feedback during RL to learn faster by assigning credit in rollouts spanning hundreds of thousands of tokens.
明日Composer 2とその先についてのオンライントークをやります。遊びに来てください。 https://luma.com/b62tuosv?utm_source=sasha
原文を表示 (en)
I’ll be giving an online talk about Composer 2 and beyond tomorrow. Come say hi. https://luma.com/b62tuosv?utm_source=sasha
but what if we made an LLM from scratch?

Behind the scenes of mni-ml: January 4th 2026 - my roommate @MankyDankyBanky and I wanted to do a big project together. ”maybe we should try to build pytorch from scratch” We found @srush_nlp's minitorch curriculum and committed to grinding through it Jan to April. February - autodiff and tensor internals done. lots of late night PR reviews, stacked diffs, Kinton ramen runs to Toronto when I'd visit Aadi at Shopify. We started posting on X to keep ourselves accountable. March - the month of parallelization: Aadi shipped tiled matmul using the same algo @nvidia teaches in their CUDA guide, wrapped by end of month - pooling, conv1d/2d forward+backward, softmax, dropout. March 22-23 — @socraticainfo symposium & we see the tinytpu team on the stage which filled us with determination 🫡 cc: @evanliin @XanderChin @suryasure05 @kennykgguo March 24 - chose the mni-ml brand and started the educational blog March 30 - minitorch is DONE ahead of schedule. now we build on top of the framework. April 5-6 - cuBLAS matmul via koffi FFI. buffer pooling, strided batched GEMM, kernel optimizations. CUDA backend takes shape. April 7 - huge day. cross-platform CI pipeline, prebuilt npm binaries, v0.3.0 — CUDA live on @npmjs. flatten the monorepo, add @WebGPU + Windows CUDA build targets by eod. April 12 - flash attention CUDA kernel ships. we caught a bug where head dim > 32 was truncating. April 14 (during exam season), we recorded the demo in @Shopify recording studio during Aadi’s lunch break. Everything over the last 4mo finally came together. Cc: @fnthawar @tobi @alspee April 17: launch post and bought the domain https://t.co/NQaGnIFVAO and we’re just getting started. We have so much in store for this summer, stay tuned 🫡 cc: @sundeep @GavinSherry

Kind of crazy that models can just instantly produce really useful custom dashboards.

Cursor can now respond by creating interactive canvases to visually represent information. Ask it to generate dashboards and custom interfaces that are richer than plain text.

