Recent posts
ソフトウェアセキュリティは今ホットなトピックだ @cogent_security とのタイムリーなポッドキャストで、サイバー防御のためのエージェント構築について語る
原文を表示 (en)
software security is a hot topic right now timely podcast with @cogent_security on building agents for cyber defense

Great conversation on the Max Agency podcast with @cogent_security Co-Founder + CTO Geng Sng on building agents for autonomous cyber defense. Check out the full episode: ⏯️ YouTube: https://youtu.be/D6XWu54oG4g 🎧 Apple: https://podcasts.apple.com/nz/podcast/how-cogent-builds-ai-agents-that-have-to-be-right-every/id1891551672?i=1000769089112 🎧 Spotify: https://open.spotify.com/episode/4605K9ojyFmq1Dn4QjWeZ0?si=4d972381c1354446
Max Agency ポッドキャストで @cogent_security の共同創業者・CTO である Geng Sng と、自律型サイバー防御のためのエージェント構築についての素晴らしい対話。 フルエピソードはこちら: ⏯️ YouTube: https://youtu.be/D6XWu54oG4g 🎧 Apple: https://podcasts.apple.com/nz/podcast/how-cogent-builds-ai-agents-that-have-to-be-right-every/id1891551672?i=1000769089112 🎧 Spotify: https://open.spotify.com/episode/4605K9ojyFmq1Dn4QjWeZ0?si=4d972381c1354446
原文を表示 (en)
Great conversation on the Max Agency podcast with @cogent_security Co-Founder + CTO Geng Sng on building agents for autonomous cyber defense. Check out the full episode: ⏯️ YouTube: https://youtu.be/D6XWu54oG4g 🎧 Apple: https://podcasts.apple.com/nz/podcast/how-cogent-builds-ai-agents-that-have-to-be-right-every/id1891551672?i=1000769089112 🎧 Spotify: https://open.spotify.com/episode/4605K9ojyFmq1Dn4QjWeZ0?si=4d972381c1354446
約1週間後、NYC に行くんだけど、ニューヨークのトップエージェント企業の一つ @traversal_ai の @_anish_agarwal とファイアサイドチャットをやります ぜひ参加してください! https://partiful.com/e/MLzglw6al1KrUSXSrOm6
原文を表示 (en)
im going to be in NYC in ~1 week, and am doing a fireside chat with one of the top agent companies in new york - @_anish_agarwal of @traversal_ai come join us! https://partiful.com/e/MLzglw6al1KrUSXSrOm6
langsmithエンジン、やれば出来る子だった
原文を表示 (en)
the langsmith engine that could

langsmith engine...

code interpreterは軽量なコード実行環境です 以下のことができます: - RLM - プログラマティックツール呼び出し - その他! 完全なサンドボックスをセットアップする必要なく ここのユースケースについてはもっと書きますが、ぜひチェックしてください!
原文を表示 (en)
code interpreter is a light weight code execution environment lets you do: - RLMs - programmatic tool calling - more! without having to spin up a full sandbox we'll be writing a lot more about the use cases here, but check it out!

LangSmith Engine を どうやって構築したかについての素晴らしいディープダイブ 楽しい学びと便利なコツばかり
原文を表示 (en)
great deep dive into how we built LangSmith Engine lots of fun learnings and tips and tricks

deepagents 0.6リリースで良い内容がたくさんある! sydneyによる素晴らしい記事
原文を表示 (en)
lots of good things in 0.6 release of deepagents! great write up by sydney

「LLM エージェントの失敗に対して確実に」
原文を表示 (en)
“Dependably for LLM agent failures”

@hwchase17 Started on this and finding it awesome; also LangSmith engine sparked an idea. The "Dependabot like for LLM agent failures". LangSmith Engine gives you the smoke detector. The natural next layer is a sprinkler system; an auto-remediation with a human approval gate. A four-stage pipeline comes to mind: Classify → Patch → Eval → Shadow Trying it and will share trace results. This is a real gap in the LLMOps ecosystem; glad to see it being closed. 🔥 Will keep updated on the progress @LangChain_OSS

LangSmith を見る一つの視点として、組織全体でエージェント構築に協力するプラットフォーム 異なるペルソナ間のフィードバックループを高速化するのに役立つ
原文を表示 (en)
one way to view langsmith is as a platform for the whole org to collaborate on building agents helps speed up that feedback loop between different personas

@hwchase17 the tooling side is maturing fast but the feedback loop between engineers and domain experts is still the bottleneck. keep running into this building agents: you can instrument everything and still miss what actually matters without that human layer
viv はいつも俺より上手く言う 最近、エージェントを測定して反復的に改善するシステムとして見なす話が多いけど それは「技術的な」ことだけじゃない。人間&チームのことでもある
原文を表示 (en)
viv always says it better than me lots of talk recently of thinking of agents as systems to measure and iteratively improve but - thats not JUST a technical thing. its also a human & team thing

my fave point from here: the earlier you think about your agent as a system that can be measured & improved, the faster you can get a robust agent into production This isn’t just a technical thing, it’s a human & team thing. Teams that succeed here ask questions like: - “what do I need my agent to do to make our customers happy?” - “What scenarios will my agent encounter in the wild and how can we recreate that in our testing” Evals are the substrate that determines what your agent does in production. They’re training data for agents because we literally fit our agent to pass Evals via hill-climbing algorithms and human edits to pass failure modes Once you get your agent into users hands, the eval generation loop compounds. Production data uncovers more issues, these issues turn into Evals, and the agent fits to improve over more cases that could not be captured without real user data. In the early stages, teams dogfooding their product becomes the feedback signal Curating evals and running experiments on different agent variants is a muscle every team develops, our goal is to create tooling so every team can create the best agents for their tasks using this foundation

エージェントを構築する最高のチームは早期リリースと迅速なイテレーションを心がけている エージェントをリリースして忘れるわけにはいかない 最高のエージェントを手に入れるコツは反復改善ループを構築することだ
原文を表示 (en)
the best teams building agents ship early and iterate quickly you can't just ship an agent and then forget about the key to getting the best agents is to build an iterative improvement loop

AIコーディングは高くつくようになってきた もっとオープンモデルを使おう!
原文を表示 (en)
ai coding is getting expensive use more open models!

Kimi K2.6 on @baseten is ~5x cheaper than Opus 4.7 For a large majority of tasks, it's roughly the same performance If you want to use open models for coding, try them out in deepagents-cli:

私の姿勢は最悪だけど でもAlexの洞察は素晴らしいので帳消しにしてくれる
原文を表示 (en)
my posture is terrible but Alex's insights are great, so it evens out


Talked to @ramplabs Head of Applied Research Alex Shevchenko on the Max Agency podcast to learn how @Ramp Sheets was built, their internal agent Inspect, and so much more. YouTube: https://www.youtube.com/watch?v=trEM9OKr5Sc Apple: https://podcasts.apple.com/us/podcast/how-ramp-built-an-ai-agent-that-can-think-outside/id1891551672?i=1000766630498 Spotify: https://open.spotify.com/episode/49NvGQRI2TWlYXziJ2ALiz
Max Agencyポッドキャストで@Rampの応用研究責任者Alex Shevченkoと話して、@Ramp Sheetsがどうビルドされたか、彼らの内部エージェントInspect、その他いろいろ学びました。 YouTube: https://www.youtube.com/watch?v=trEM9OKr5Sc Apple: https://podcasts.apple.com/us/podcast/how-ramp-built-an-ai-agent-that-can-think-outside/id1891551672?i=1000766630498 Spotify: https://open.spotify.com/episode/49NvGQRI2TWlYXziJ2ALiz
原文を表示 (en)
Talked to @ramplabs Head of Applied Research Alex Shevchenko on the Max Agency podcast to learn how @Ramp Sheets was built, their internal agent Inspect, and so much more. YouTube: https://www.youtube.com/watch?v=trEM9OKr5Sc Apple: https://podcasts.apple.com/us/podcast/how-ramp-built-an-ai-agent-that-can-think-outside/id1891551672?i=1000766630498 Spotify: https://open.spotify.com/episode/49NvGQRI2TWlYXziJ2ALiz
このdeepagents deployについて https://docs.langchain.com/oss/python/deepagents/deploy (少なくともこれが目指す方向性) 何が足りない?フィードバックをください!
原文を表示 (en)
this deepagents deploy https://docs.langchain.com/oss/python/deepagents/deploy (or at least directionally where we want to take it) what's missing? give us feedback!


can someone PLEASE launch OS claude managed agents?? I’ll pay anything. Was building this myself at some point but would prefer someone do it with 100% focus. generational opportunity right here. i can list 30-50 companies that would use it instantly
