Diego Oppenheimer

@doppenhe

Helping build AI companies @aitinkerers Past: founder @guardrails_ai , Partner@FactoryHQ, EVP @datarobot, CEO Algorithmia @mspowerbi @msexcel 🇺🇸🇺🇾

3.3KFollowers364Following6.9KPostsView on X

Recent posts

Diego Oppenheimer @doppenhe

エージェント

Stanford AI Index 2026: agents went from 12% to 66.3% success on real computer tasks in one year. Within 6 points of human performance. OSWorld: actual computer use, not a curated benchmark. Enterprise teams that piloted in 2025 and said "not ready" evaluated a different technology.

2h ago

10044Xで開く

Diego Oppenheimer @doppenhe

エージェント

Anthropic limiting third-party harnesses on subscription plans while OpenAI doesn't is worth paying attention to for one specific reason: it changes the economics of building agents on top of platform pricing. All-you-can-eat subscription plans were never designed for agentic usage volumes. An agent running continuous loops burns tokens at rates that make flat-fee plans economically untenable for the platform. The restriction isn't surprising. The timing is, because practitioners were already building on that assumption. The more important signal: practitioners are now asking whether their agent architecture is portable. Tightly coupled to Claude's APIs, this kind of platform decision hits your cost model and your operational assumptions at the same time. A harness that abstracts the model layer leaves you insulated. Models are commoditizing fast enough that the switch is technically viable if the harness is right. Performance gaps on most production tasks are narrowing. The platform risk isn't model quality. It's whether your architecture assumed a pricing and access model that the platform can change unilaterally. Design the harness so the model is a parameter, not a dependency.

2h ago

110121Xで開く

Diego Oppenheimer @doppenhe

リサーチ

これら3つはどの企業展開でも見かける。構造化された出力が最初に壊れる。バリデーションは静かに失敗する。パイプラインは見た目は大丈夫...そして突然違う。 8コンポーネントのフレーミングは便利。さらに有用な信号: このレイヤーは統合されつつある。18ヶ月前に自社製品をリリースするために内部で構築したチームが、今ではベンダーになっている。

原文を表示 (en)

These three show up in every enterprise deployment I've seen. Structured outputs break first. Validation fails silently. The pipeline looks fine until it doesn't. The 8-component framing is useful. More useful signal: this layer is consolidating. Teams that built it internally 18 months ago to ship their own product are now the vendors.

Towards Data Science@TDataScience

"It was always the same three problems: broken structured outputs, silent validation failures, and pipelines that looked fine until they didn’t." @emmimalpa walks us through the components of the control layer she created to address LLM issues that prompt engineering on its own couldn't resolve. https://t.co/MjHK0iEBwS

18h ago

100160Xで開く

Diego Oppenheimer @doppenhe

リサーチ

小売業における関税と AI の報道のほとんどは表面的です：Nvidia のコストが上昇し、ハイパースケーラーがそれを吸収し、小規模プレイヤーが圧迫される。これは正確で、ほぼ完全に価格に反映されています。私が実際に注目するところはここです。電力インフラが誰もが話さない制約要因です：ボトルネックは GPU ではなく、電力です。米国は大型変圧器の 80% 以上を輸入しており、リードタイムは既に 18～24 ヶ月で、稼働中の 55% が 33 年以上前のものです。関税は外国の競争企業には打撃を与えますが、国内メーカーには与えません。データセンター向けの電力インフラを構築・設置する企業はオフショア化できません。これは循環的なポジションではなく、構造的なポジションです。このカテゴリーは AI カバレッジには出ていません。グリッド関連のカバレッジに出ています。効率プレミアムはコンピュートが高くなると上昇します：フロンティアコンピュートが高くなると、より少ないリソースでより多くのことをする相対的価値が増します。特定のタスク向けに最適化されたモデル、より高速な推論、エッジ展開。これらが進展します。効率を解決している企業は GPU が最も多い企業ではありません。アーキテクチャがどちらの方向に動いているかに注目する価値があります。コロケーション統合プレイ：大規模オペレーターは関税によるコスト上昇を吸収できます。小規模な競争企業はできません。ここでの仕分けメカニズムは統合に向かって動きます。既存の最大の容量と顧客関係を持つプレイヤーが競争企業の撤退から利益を得ます。フロンティアモデルトレーニングにおける中間の圧迫：フロンティアスケールでトレーニングしているが、インフラを所有していない企業は異常な立場にあります。クラウド依存で、短期的には彼らを保護します。しかし、カスタムシリコンなし、所有データセンターなし、地理的ルーティングなし。関税コストが最終的にクラウド価格に転嫁されると、オフセットなしで圧力に直面します。そのラグは数ヶ月から数年です。現在の議論に完全には含まれていません。「AI を機能させるのは誰か」ではなく、「コンピュートが高いときに AI を機能させるのは誰か」と聞いてください。そして、誰がコスト曲線へのヘッジを持っているのか、そして誰が構造的なオフセットなしでそれに露出しているのかを聞いてください。電力インフラのストーリー、効率アーキテクチャのストーリー、フロンティアラボの中間の圧迫。これらは価格に反映されていないようです。

原文を表示 (en)

Most retail coverage of tariffs and AI stops at the surface layer: Nvidia costs go up, hyperscalers absorb it, smaller players get squeezed. That's accurate and almost entirely priced in. Here's where I'd actually look for signal. Power infrastructure is the binding constraint nobody writes about: The bottleneck isn't GPUs. It's power. The US imports more than 80% of large power transformers, lead times are already 18-24 months, and 55% of in-service units are over 33 years old. Tariffs hit foreign competitors here but not domestic manufacturers. Companies that build and install power infrastructure for data centers can't be offshored. That's a structural position, not a cyclical one. This category doesn't show up on AI coverage. It shows up on grid coverage. The efficiency premium goes up when compute gets expensive: When frontier compute costs more, the relative value of doing more with less increases. Models optimized for specific tasks, faster inference, edge deployment. These gain ground. The companies solving for efficiency aren't the ones with the most GPUs. Worth paying attention to which direction the architecture is moving. The colocation consolidation play: Larger operators can absorb tariff-driven cost increases. Smaller competitors can't. The sorting mechanism here runs toward consolidation. The players with the most existing capacity and customer relationships benefit from competitors exiting. The middle squeeze in frontier model training: The companies training at frontier scale but not owning their infrastructure are in an unusual position. Cloud-dependent, which insulates them in the short term. But no custom silicon, no owned data centers, no geographic routing. When the tariff costs eventually pass through to cloud pricing, they face pressure with no offset. That lag is months to a couple of years. It's not fully in the current discussion. Don't ask who makes AI work. Ask who makes AI work when compute is expensive. And ask who has a hedge against the cost curve, and who's exposed to it with no structural offset. The power infrastructure story, the efficiency architecture story, the frontier lab middle squeeze. Those don't seem to be priced in.

1d ago

000110Xで開く

Diego Oppenheimer @doppenhe

リサーチ

Noam Brown（OpenAI）：単一数値のAIベンチマークはもう意味をなさない。そうだね。パフォーマンスはモデルが推論時間をどれくらい得るかに依存する。ベンチマークは特定の計算予算を想定している。本番運用は別の予算で動いている。エンタープライズ買い手は、自分たちがデプロイしているものを説明しない数字を購入している。

原文を表示 (en)

Noam Brown (OpenAI): single-number AI benchmarks no longer make sense. Right. Performance now depends on how much reasoning time the model gets. A benchmark captures one compute budget. Production runs another. Enterprise buyers are purchasing a number that doesn't describe what they're deploying.

1d ago

300202Xで開く

Diego Oppenheimer @doppenhe

リサーチ

エンタープライズデータ責任者のレポートによると、AI精度は30％から85％の範囲で報告されている。この差はどのモデルを使っているかの問題じゃない。ほとんどの企業は実際の数字を知らない。実際の精度を把握するためのフィードバックループを構築したことがないんだ。モデルはどちらの場合でも同じように自信満々に答える。

原文を表示 (en)

Enterprise data leaders report AI accuracy anywhere from 30% to 85%. That variance isn't about which model they're using. Most of them don't know their actual number. They've never built the feedback loop to find out. The model sounds equally confident either way.

1d ago

100134Xで開く

Diego Oppenheimer @doppenhe

その他

Algorithmia で私の直属の部下5人が全員 CEO になった。私が素晴らしいマネージャーだったか、それとも最悪だったか。彼らはそれを言うには丁寧すぎる。

原文を表示 (en)

All 5 of my direct reports at Algorithmia are now CEOs. Either I was a great manager or a terrible one. They're too polite to say which.

4d ago

14001.2KXで開く

Diego Oppenheimer @doppenhe

エージェント

AI採用が最も進んでいるファウンダーたちは、別の操業モードにシフトしている。@garrytan、@rwaliany、@mvanhorn、@jheitzeb は順序立てて物事を進めていない。複数のプロジェクトを同時に深いレベルで実行している。シリアルアントレプレナーが存在するのは、実行が枷だったから。資本とアイデアはスケールしたが、実際の仕事はスケールしなかった。バンド幅は固定されていた。シリアルはその回避策だった：1つを深く掘り下げ、次を進める。 AIが削除するのは時間じゃない。意思決定の間の実行オーバーヘッド：リサーチループ、ドラフト、オペレーション、合成。それをオフロードすると、残るのは判断力だけ。判断力は複数プロジェクトで同時にスケールする。私自身もこのやり方でやってる：投資、複数企業へのアドバイス、教育、理事会参加、全部同時に。「シリアル」フレームは順序付けが選択肢だと想定している。それは制約だった。AI採用の最前線にいるファウンダーたちがそれを証明している。パラレルアントレプレナーは、その制約がなくなったときに現れるオペレーティングモード。 2026年、AI Parallel Entrepreneur が誕生した年。

原文を表示 (en)

The founders who are deepest in AI adoption have shifted to a different operating mode. @garrytan, @rwaliany, @mvanhorn, @jheitzeb aren't doing things sequentially. They're running multiple projects at real depth, simultaneously. Serial entrepreneurship exists because execution was the bottleneck. Capital and ideas scaled; the actual work of pushing things forward didn't. You had fixed bandwidth. Serial was the workaround: do one deeply, then the next. What AI removes isn't time. It's the execution overhead between decisions: the research loops, the drafting, the ops work, the synthesis. Offload that and what remains is judgment. Judgment actually scales across multiple projects at once. I'm running it this way myself: investing, advising multiple companies, teaching, sitting on boards, all simultaneously. The "serial" frame assumes sequencing was a choice. It was a constraint. The founders at the forefront of AI adoption are proving it. Parallel entrepreneur is the operating mode that emerges when the constraint goes away. 2026, the year the AI Parallel Entrepreneur was born.

1w ago

410429Xで開く

Diego Oppenheimer @doppenhe

エージェント

.@ryancarson は agent first で会社を作るパターンの中で最も興味深いものの 1 つを運営してる。絶対チェックする価値あり...

原文を表示 (en)

.@ryancarson runs one of the most interesting patterns in agent first company building I have seen out there. Well worth checking out...

Ryan Carson@ryancarson

If you … 1. Practice law (especially in a family law firm) 2. Use agents every day (especially if you’re using Claude Code, Codex, etc) 3. Want to disrupt divorce and join a legaltech startup at the ground level Hop in my DMs. Things are getting super exciting at @HelloUntangle

1w ago

100135Xで開く

Diego Oppenheimer @doppenhe

リサーチ

従来の ML では、エラーがパイプラインに沿って複合していくことを身をもって学んだ。各モデルを単独でテストしても、それらを連鎖させたときに何が起こるかは予測できなかった。ある初期の論文は、私たちが LLM agents でも同じ壁にぶつかっていることを示唆している。研究者たちが 10 の最先端モデルを長期的な agentic タスクでテストした。memory scaffolds (ステップ間でコンテキストを拡張する標準的アプローチ) は 10 モデル全てでパフォーマンスを低下させた。複雑なタスクの meltdown rate は 19% に達した。標準的なベンチマークからのモデルランキングは、タスクが長くなるにつれて逆転した。 1 つの研究であり、結論ではない。だが、その形はなじみ深い。これが一般化するなら、ML パイプラインと同じ問題だ: 個別モデル用に設計された evals はシステムの動作を予測しない。実際にデプロイする長さと複雑性でエージェントをテストする価値はあり、実装後に判明するのを待つべきではない。

原文を表示 (en)

In traditional ML, we learned the hard way that error compounds in pipelines. Testing each model in isolation didn't predict what happened when you chained them together. One early paper suggests we're hitting the same wall with LLM agents. Researchers tested 10 frontier models on long-horizon agentic tasks. Memory scaffolds (the standard approach for extending context across steps) degraded performance on all 10. Meltdown rates on complex tasks reached 19%. Model rankings from standard benchmarks inverted as tasks got longer. One study, not a conclusion. But the shape is familiar. If this generalizes, it's the same problem as ML pipelines: evals designed for individual models don't predict system behavior. Worth testing your agents at the actual length and complexity you're deploying against, rather than waiting to find out.

1w ago

600157Xで開く

Diego Oppenheimer @doppenhe

エージェント

資本配分は製品発表よりも明確なシグナルだ。3週間前CiscoがGalileoをエージェント監視用に買収。今Astrixの買収交渉中（2.5～3.5億ドル）。Astrixはネットワーク内のあらゆるエージェント、MCPサーバー、シャドウアイデンティティを発見し、各々の脆弱性をスキャンする。2件の買収、同じテーゼ、3週間の間隔。

原文を表示 (en)

Capital allocation is a cleaner signal than product announcements. Three weeks ago Cisco bought Galileo for agent observability. Now in talks to acquire Astrix for $250-350M. Astrix discovers every agent, MCP server, and shadow identity in your network and scans each for vulnerabilities. Two acquisitions, same thesis, three weeks apart.

3w ago

110162Xで開く

Diego Oppenheimer @doppenhe

リサーチ

Stanford 2026 AI Index: トップモデルはベンチマークで非常に接近しており、誰もそれで選択していません。実際の違い:信頼性、一貫した指示フォロー、スケールでのコスト。「どのモデルがより賢いか」から「どのモデルが最も壊れないか」に移りました。アップタイムに注意を払う時が来た。

原文を表示 (en)

Stanford 2026 AI Index: top models are so close on benchmarks nobody's choosing on that anymore. Real differentiators: reliability, consistent instruction-following, cost at scale. We moved from "which model is smarter" to "which model breaks least." Time to pay attention to uptimes.

4w ago

10073Xで開く

Diego Oppenheimer @doppenhe

エージェント

このプロジェクトに小さな貢献ができて光栄です。Jenni、Abby、Celeste、Alfredoと AI@GSB チームが1つのゲスト講演から構築したものは素晴らしい。今ではスタンフォード MBA プログラムで最大の学生組織。次世代のリーダーがAIについて学ぶのではなく、AIで構築することを学んでいます。

原文を表示 (en)

Honored to have played a small part in this. What Jenni, Abby, Celeste, Alfredo and the AI@GSB team built from a single guest lecture is remarkable. Now the largest student org at Stanford's MBA program. The next generation of leaders learning to build with AI, not just about it.

Poets&Quants@PoetsAndQuants

Inside AI@GSB: The student movement transforming Stanford’s MBA experience, part of a tsunami of change triggered by artificial intelligence. Read More: https://hubs.la/Q04dph-30 #mba #stanford #bschool #businesseducation #businessschool #ai #gsb

4w ago

100234Xで開く

Diego Oppenheimer @doppenhe

リサーチ

PwC 2026: 上位20%がAI利得の75%を独占。彼らを分け隔てるもの:彼らはコスト削減ではなく収益成長を測定した。2021年に聞こえます。違います。コスト削減プロジェクトは開始しやすく放棄しやすい。収益成長は、開始前に何を構築するか知っていたということ。

原文を表示 (en)

PwC 2026: top 20% capture 75% of AI gains. What separates them: they measured revenue growth, not cost reduction. Sounds like 2021. It's not. Cost reduction projects are easy to start and easy to abandon. Revenue growth means you knew what you were building before you started.

4w ago

10086Xで開く

Diego Oppenheimer @doppenhe

その他

人の期待って時々マジで理解不能。「厳重警備の軍事基地で、中に入って写真撮ることが許可されなかった」で星1つ。

原文を表示 (en)

People's expectations are wild to me sometimes. "heavily guarded military base, wasn't allowed to come in and take pictures," one star.

4w ago

00083Xで開く

Diego Oppenheimer @doppenhe

エージェント

OpenAI took the multi-agent design pattern everyone was already building themselves and turned it into a product with a price tag. GPT-5.4 handles planning and judgment. Mini/nano run the parallel work at 30% of the cost. No more framework decisions. You just pick which model thinks and which one executes.

4w ago

000108Xで開く