# Andrej Karpathy: From Vibe Coding to Agentic Engineering

## Metadata
- Channel: Sequoia Capital
- Duration: 30 min
- YouTube: https://www.youtube.com/watch?v=96jN2OCOfLs

## Transcript

**[00:02] Speaker A:** We're so excited for our very first special guest. He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI. He actually helped co-found OpenAI right inside of this office. Was the one who actually got Autopilot working at Tesla back in the day, and he has a rare gift of making the most complex technical shifts feel both accessible and inevitable.  
我们非常激动地迎来第一位特别嘉宾。他参与构建了现代 AI,然后解释现代 AI,偶尔还会给现代 AI 重新命名。他实际上就在这间办公室里参与联合创立了 OpenAI,也是当年让 Tesla 的 Autopilot 真正运转起来的那个人,而且他有一种罕见的天赋,能把最复杂的技术变革讲得既易懂又显得理所当然。  
**[00:30] Speaker A:** You all know him for having coined the term vibe coding last year, but just in the last few months, he said something even more startling. That he's never felt more behind as a programmer. That's where we're starting today. Thank you, Andrej, for joining us.  
大家都知道他在去年创造了「vibe coding」这个词,但就在最近几个月,他说了一句更令人震惊的话:他从未像现在这样觉得自己作为程序员落后了。这就是我们今天的起点。感谢 Andrej 加入我们。  
**[00:44] Speaker B:** Yeah. Hello. Excited to be here and to kick us off.  
你好。很高兴来到这里,为我们开个头。  
**[00:47] Speaker A:** Okay. So, just a couple months ago, you said that you've never felt more behind as a programmer. That's startling to hear from you of all people. Can you help us unpack that? Was that feeling  
好的。就在几个月前,你说你从未像现在这样觉得自己作为程序员落后了。从你这样的人口中听到这话真是令人震惊。能帮我们理解一下吗?那种感觉是——  
**[00:57] Speaker A:** Exhilarating or unsettling?  
令人兴奋还是令人不安?  
**[01:00] Speaker B:** Yeah, a mixture of both for sure. Well, first of all, I guess like as many of you, I've been using agentic tools like LLM code, adjacent things, for a while, maybe over the last year as it came out and it was very good at, you know, chunks of code and sometimes it would mess up and you have to edit them and it was kind of helpful. And then I would say December was this clear point where for me I was on a break so I had a bit more time.  
两者都有。首先,我想和你们很多人一样,我在过去一年左右一直在使用 agentic 工具,比如 LLM 代码相关的东西,它在生成代码片段方面很不错,有时会出错需要你去编辑,算是有帮助。然后我觉得 12 月是一个明确的转折点,因为我当时在休假,有更多时间。  
**[01:22] Speaker B:** I think many other people were similar and I just started to notice that with the latest models the chunks just came out fine and then I kept asking for more and it just came out fine and then I can't remember the last time I corrected it and then I just, you know, trusted the system more and more and then I was vibe coding. [laughter] And so it was kind of a—I do think that it was a very stark transition.  
我想很多人也有类似经历,我开始注意到用最新的模型时,代码片段就这么完美地生成出来了,然后我不断要求更多,它还是完美生成,我已经记不清上次纠正它是什么时候了,然后我就越来越信任这个系统,然后我就在 vibe coding 了。(笑声)所以这确实是一个非常明显的转变。  
**[01:43] Speaker A:** I think that a lot of people actually—I tried to stress this on Twitter, or X—because I think a lot of people experienced AI last year as a ChatGPT-adjacent thing.  
我觉得很多人实际上——我在 Twitter 或者说 X 上试图强调这一点——因为我觉得很多人去年体验 AI 还是把它当作 ChatGPT 那样的东西。  
**[01:52] Speaker A:** But you really had to look again, and you had to look as of December, because things have changed fundamentally, and especially on this agentic coherent workflow that really started to actually work.  
但你真的需要重新审视,而且要从 12 月开始审视,因为事情已经发生了根本性的变化,尤其是在这种 agentic 的连贯工作流上,它真的开始能用了。  
**[02:04] Speaker A:** And so I would say that, yeah, it was just that realization that really had me go down this whole rabbit hole of just, you know, infinite side projects.  
所以我想说,正是这个认识让我掉进了这个兔子洞,开始做无数个副项目。  
**[02:16] Speaker A:** My side projects folder is extremely full with lots of random things, and just V0 coding all the time.  
我的副项目文件夹里塞满了各种随机的东西,一直在用 V0 编程。  
**[02:21] Speaker A:** So yeah, that kind of happened in December, I would say, and I was looking at the repercussions of that since.  
所以这种情况发生在 12 月,我从那以后一直在观察它带来的影响。  
**[02:28] Speaker B:** You've talked a lot about this idea of LLMs as a new computer, that it isn't just better software, it's a whole—  
你经常谈到 LLM 作为一种新型计算机的想法,它不只是更好的软件,而是一个全新的——  
**[02:35] Speaker A:** New computing paradigm. And software 1.0 was explicit rules, software 2.0 was learned weights, software 3.0 is this.  
新的计算范式。Software 1.0 是显式规则,Software 2.0 是学习到的权重,Software 3.0 就是这个。  
**[02:43] Speaker A:** If that's actually true, what does a team build differently the day they actually believe this?  
如果这确实是真的,那么一个团队在真正相信这一点的那天,会以什么不同的方式来构建?  
**[02:50] Speaker B:** Right? So yeah, exactly. So software 1.0, I'm writing code, software 2.0, I'm actually programming by creating datasets and training neural networks.  
对。所以 Software 1.0,我在写代码;Software 2.0,我实际上是通过创建数据集和训练神经网络来编程。  
**[02:59] Speaker B:** So the programming is kind of like arranging datasets and maybe some objectives and neural network architectures.  
所以编程就像是在组织数据集,可能还有一些目标函数和神经网络架构。  
**[03:03] Speaker B:** And then what happened is that basically if you train one of these GPT models or LLMs on a sufficiently large set of tasks implicitly, because by training on the internet you have to multitask all the things that are in the dataset.  
然后发生的事情是,如果你在足够大的任务集上训练这些 GPT 模型或 LLM,这些任务是隐式的,因为在互联网上训练就必须多任务处理数据集中的所有内容。  
**[03:15] Speaker B:** These actually become kind of like a programmable computer in a certain sense.  
这些模型实际上在某种意义上变成了一种可编程的计算机。  
**[03:20] Speaker B:** So software 3.0 is kind of about, you know, your programming now turns to prompting and what's in the  
所以 Software 3.0 就是,你的编程现在变成了提示,而上下文窗口中的内容——  
**[03:25] Speaker A:** Context window is your lever over the interpreter that is the LLM that is kind of like interpreting your context and performing computation in the digital information space.  
就是你控制解释器的杠杆,这个解释器就是 LLM,它在解释你的上下文并在数字信息空间中执行计算。  
**[03:34] Speaker A:** So I guess yeah that's kind of the transition and I think there's a few examples of that really drove it home for me and maybe that might be instructive.  
所以我想这就是这种转变,我觉得有几个例子真正让我明白了这一点,也许会有启发。  
**[03:42] Speaker A:** So for example when OpenClaw came out, when you want to install OpenClaw you would expect that normally this is a bash script like a shell script.  
比如当 OpenClaw 出来的时候,你想安装 OpenClaw,通常你会期待这是一个 bash 脚本,就是一个 shell 脚本。  
**[03:52] Speaker A:** So run the shell script to install OpenClaw.  
所以运行这个 shell 脚本来安装 OpenClaw。  
**[03:54] Speaker A:** But the thing is that in order to target lots of different platforms and lots of different types of computers you might run OpenClaw.  
但问题是,为了支持很多不同的平台和很多不同类型的计算机,你可能会运行 OpenClaw——  
**[04:01] Speaker A:** These shell scripts usually balloon up and become extremely complex.  
这些 shell 脚本通常会膨胀得非常复杂。  
**[04:05] Speaker A:** But the thing is you're still stuck in a software 1.0 universe of wanting to write the code.  
但问题是你仍然困在 Software 1.0 的思维模式里,还想着自己去写代码。  
**[04:07] Speaker A:** And actually the OpenClaw installation is a copy paste of a bunch of text that you're  
而实际上 OpenClaw 的安装就是一段文本,你复制粘贴给你的 agent 就行。  
**[04:13] Speaker A:** supposed to give to your agent. So basically it's a little script of, you know, copy-paste this and give it to your agent and it will install OpenClaw.  
基本上就是一个小脚本,你复制粘贴这段文本给你的 agent,它就会安装 OpenClaw。  
**[04:20] Speaker A:** And the reason this is a lot more powerful is you're working now in the Software 3.0 paradigm where you don't have to precisely spell out all the individual details of that setup.  
这种方式强大得多的原因是,你现在是在 Software 3.0 范式下工作,不需要精确地写出设置的每个细节。  
**[04:29] Speaker A:** The agent has its own intelligence that it packages up and then it follows the instructions and it looks at your environment, your computer, and it performs intelligent actions to make things work and it debugs things in the loop and it's just so much more powerful, right?  
Agent 本身有智能,它会打包这些智能,然后按照指令执行,查看你的环境和电脑,智能地执行操作让事情运转起来,还能在循环中调试问题,这强大太多了。  
**[04:42] Speaker A:** So I think that's a very different way of thinking about it—just what is the piece of text to copy-paste to your agent?  
所以我觉得这是一种完全不同的思考方式——就是想清楚要复制粘贴给 agent 的那段文本是什么。  
**[04:47] Speaker A:** That's the programming paradigm.  
这就是新的编程范式。  
**[04:48] Speaker A:** Now I think one more example that comes to mind that is even more extreme than that is when I was building Menugenen.  
我想到另一个更极端的例子,是我在做 Menugenen 的时候。  
**[04:56] Speaker A:** Menu Gen is this idea where you come to a restaurant, they give you a menu. There's no pictures usually. So I don't know what any of these things are. Usually like 30% of the things I have no idea what they are, 50%.  
Menu Gen 的想法是这样的:你去餐厅,他们给你菜单,通常没有图片。所以我不知道这些菜是什么样的。通常有 30% 的菜我完全不知道是什么,甚至 50%。  
**[05:07] Speaker A:** So I wanted to take a photo of the restaurant menu and to get pictures of what those things might look like in a generic sense.  
所以我想拍张餐厅菜单的照片,然后获取这些菜品大概长什么样的通用图片。  
**[05:16] Speaker A:** And so I built, I V0'd this app that basically lets you upload a photo and it does all this stuff and it runs on Vercel and it basically re-renders the menu and it gives you like all the items and it gives you a picture that it uses an image generator for to basically OCR all the different titles, use the image generator to get pictures of them and then shows it to you.  
于是我用 V0 做了这个应用,基本上就是让你上传照片,它会做所有这些处理,运行在 Vercel 上,重新渲染菜单,给你所有菜品,用图像生成器 OCR 识别所有菜名,生成对应的图片,然后展示给你。  
**[05:37] Speaker A:** And then I saw the Software 3.0 version of this which blew my mind, which is literally just take your photo, give it to Gemini and say use NanoBanana to overlay the things onto the menu. And NanoBanana  
然后我看到了 Software 3.0 版本的做法,让我震惊,就是直接拍照,给 Gemini,让它用 NanoBanana 把图片叠加到菜单上。  
**[05:51] Speaker A:** Basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels—it rendered the different things in the menu—and this blew my mind because actually all of my menu gen is spurious.  
NanoBanana 基本上返回的图像就是我拍的那张菜单照片,但它实际上在像素层面渲染了菜单里不同菜品的样子——这让我震惊,因为我的整个 menu gen 其实是多余的。  
**[06:04] Speaker A:** It's working in the old paradigm that app shouldn't exist, and yeah, the software 3.0 paradigm is a lot more kind of raw.  
它还在用旧范式工作,那个应用根本不应该存在,而 Software 3.0 范式要更原始直接得多。  
**[06:11] Speaker A:** It just—your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image, and there's no need to have any of the app in between.  
就是你的神经网络在做越来越多的工作,你的 prompt 或上下文就是那张图片,输出也是图片,中间不需要任何应用。  
**[06:21] Speaker A:** So I think that people have to kind of like reframe—you know, not to work in existing paradigm of what things existed—and just think about it as a speed up of what exists.  
所以我觉得人们需要重新思考——不要用现有事物的既有范式去思考——不要只把它当作现有事物的加速。  
**[06:31] Speaker A:** It's actually like new things are available now.  
实际上是有新的可能性出现了。  
**[06:33] Speaker A:** And going back to your programming question, it's not even—I think that's also an example of working in the old mindset—because it's not just about programming and programming becoming...  
回到你关于编程的问题,我觉得这也是用旧思维在思考的例子——因为这不仅仅是关于编程,不仅仅是编程变得……  
**[06:42] Speaker A:** Faster, this is more general information processing that is automatable now, so it's not just even about code.  
更快,这是更广泛的信息处理现在可以自动化了,所以甚至不只是关于代码。  
**[06:49] Speaker A:** So previous code worked over kind of like structured data, right, and you write code over structured data.  
以前的代码是处理结构化数据的,对吧,你针对结构化数据写代码。  
**[06:53] Speaker A:** But like for example with my LLM knowledge base project, basically you get LLMs to create wikis for your organization or for you in person, etc.  
但比如我的 LLM knowledge base 项目,基本上就是让 LLM 为你的组织或个人创建 wiki 等等。  
**[07:01] Speaker A:** This is not even a program, this is not something that could exist before because there was no code that would create a knowledge base based on a bunch of facts.  
这甚至不是一个程序,这是以前不可能存在的东西,因为没有代码能基于一堆事实创建知识库。  
**[07:09] Speaker A:** But now you can just take these documents and basically recompile them in a different way and reorder them and create something that is new and interesting as a reframing of the data.  
但现在你可以直接拿这些文档,用不同的方式重新编译,重新排序,创造出新的、有趣的数据重构。  
**[07:19] Speaker A:** And so these are new things that weren't possible, and so I think this is something that I keep trying to get back to, as to not only what can we do that existed that is faster now, but I think there's new opportunities of just...  
所以这些是以前不可能的新事物,我一直想强调的就是这点,不仅仅是我们能把已有的事情做得更快,而是有新的机会出现了……  
**[07:33] Speaker A:** Things that couldn't be possible before, and I almost think that that's more exciting.  
那些以前不可能实现的事情，我几乎觉得这才是更令人兴奋的部分。  
**[07:37] Speaker B:** I love the menu generation progression and dichotomy that you laid out, and I think even I'm sure many folks here followed your own progression of programming from last October to early January, February this year.  
我很喜欢你刚才描述的菜单生成的演进过程和对比，而且我相信在座的很多人都关注了你自己从去年10月到今年1、2月份在编程方面的进步历程。  
**[07:48] Speaker B:** If you extrapolate that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era? What will look completely obvious in hindsight that is still mostly unbuilt today?  
如果把这个趋势继续推演下去，什么会成为2026年的等价物——就像90年代建网站、2010年代做移动应用、上一个云时代构建SaaS那样？什么东西事后看来会显得理所当然，但现在还基本没被开发出来？  
**[08:08] Speaker A:** Well, going with the example of menu, I guess, so a lot of this code shouldn't exist and it's just neural networks doing most of the work.  
嗯，还是拿菜单这个例子来说，我觉得很多代码其实不应该存在，大部分工作应该由神经网络来完成。  
**[08:15] Speaker A:** I do think that the extrapolation looks very weird because you could basically imagine—I don't—yeah, so you could imagine completely neural computers in a certain sense.  
我确实认为这种推演看起来会很奇怪，因为你基本上可以想象——我不知道——对，你可以想象某种意义上完全由神经网络驱动的计算机。  
**[08:25] Speaker A:** Sense you feed raw videos like imagine a device that takes raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is kind of like, you know, unique for that moment in a certain sense. And I kind of feel like in the early days of computing actually people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets, and in the 50s and 60s it was not really obvious which way we'd go. And of course we went down the calculator path and ended up building classical computing, and then neural nets are currently running virtualized on existing computers. But you could imagine, I think that a lot of this will flip and that the neural net becomes kind of like the host process and the CPUs become kind of like the co-processor. So we saw the diagram of, you know, intelligence compute of neural networks is going to take over and become the dominant  
也就是说，你输入原始视频，想象一个设备接收原始视频或音频，输入到一个神经网络中，然后用扩散模型渲染出一个UI界面，这个界面在某种意义上是为那个特定时刻定制的。我有点觉得，在计算机发展早期，人们其实对计算机应该长得像计算器还是像神经网络是有些困惑的，在五六十年代这个方向并不明确。当然我们最后走了计算器那条路，建立了经典计算体系，然后神经网络目前是在现有计算机上虚拟化运行的。但你可以想象，我认为很多东西会翻转过来，神经网络会成为主进程，而CPU会变成协处理器。我们看到过那个图表，神经网络的智能计算将会接管并成为主导。  
**[09:12] Speaker A:** Spend of flops, so you could imagine something really weird and foreign when neural nets are doing most of the heavy lifting.  
浮点运算的开销，所以你可以想象当神经网络承担大部分繁重工作时，会出现一些非常奇怪和陌生的东西。  
**[09:18] Speaker A:** They're using tool use as this, you know, historical appendage for some kinds of deterministic tasks.  
它们会把工具使用当作某种历史遗留的附属功能，用来处理某些确定性任务。  
**[09:24] Speaker A:** But what's really running the show is these neural nets that are in a certain way.  
但真正主导一切的是这些神经网络，从某种角度来说。  
**[09:29] Speaker A:** So you can imagine something extremely foreign as the extrapolation, but I think we're going to probably get there sort of piece by piece.  
所以你可以想象推演的结果会是极其陌生的东西，但我觉得我们可能会一步步地到达那里。  
**[09:36] Speaker A:** And I don't—yeah, that progression is TBD, I would say.  
而且我不——对，这个演进过程还有待观察，我只能这么说。  
**[09:40] Speaker B:** [snorts]  
（轻笑）  
**[09:41] Speaker B:** I'd like to talk a little bit about this concept of verifiability, the fact that AI will automate faster and more easily domains where the output can be verified.  
我想聊聊可验证性这个概念，也就是AI会更快更容易地自动化那些输出可以被验证的领域。  
**[09:49] Speaker B:** If that framework is right, what work is about to move much faster than people realize, and what professions do we have that people actually think are safe but that are—  
如果这个框架是对的，什么工作会比人们预期的发展得快得多，还有哪些职业人们以为是安全的，但实际上——  
**[10:00] Speaker A:** Actually highly verifiable?  
其实是高度可验证的？  
**[10:02] Speaker B:** Yes. So I spent some time writing about verifiability and basically traditional computers can easily automate what you can specify in code, and this latest round of LLMs can easily automate what you can verify in a certain sense, because the way this works is that when frontier labs are training these LLMs, these are giant reinforcement learning environments.  
是的。我花了些时间研究可验证性，基本上传统计算机可以轻松自动化你能用代码指定的东西，而最新一轮的LLM可以轻松自动化你能验证的东西，从某种意义上说，因为这些前沿实验室训练LLM的方式是，这些是巨大的强化学习环境。  
**[10:24] Speaker B:** So they are given verification rewards and then because of the way that these models are trained, they end up progressing and creating these jagged entities that really peak in capability in verifiable domains like math and code and adjacent areas, and kind of stagnate and are a little bit rough around the edges when things are not in that space.  
它们会获得验证奖励，然后由于这些模型的训练方式，它们最终会进化并形成这些参差不齐的实体，在数学和代码等可验证领域的能力达到峰值，在相邻领域也表现不错，但在不属于这个范围的事情上就有点停滞不前，表现得有些粗糙。  
**[10:44] Speaker B:** So I think the reason I wrote about verifiability is I'm trying to understand why these things are so  
所以我写关于可验证性的原因是，我想理解为什么这些东西如此  
**[10:49] Speaker A:** Jagged, and some of it has to do with how the labs train the models, but I think some of it also has to do with the focus of the labs and what they happen to put into the data distribution.  
参差不齐，部分原因与实验室如何训练模型有关，但我认为部分原因也与实验室的关注点以及它们恰好放入数据分布中的内容有关。  
**[10:58] Speaker A:** Because some things basically are significantly more valuable in the economy and end up creating more environments because the labs wanted to work in those settings.  
因为有些东西在经济中明显更有价值，最终会创造更多环境，因为实验室希望在这些场景中发挥作用。  
**[11:05] Speaker A:** So I think code is a good example of that.  
所以我认为代码就是一个很好的例子。  
**[11:08] Speaker A:** There's probably lots of verifiable environments they could think about that happen not to make it into the mix because they're just not that useful to have the capability around.  
可能有很多可验证的环境他们可以考虑，但恰好没有被纳入训练组合中，因为拥有这些能力并不是那么有用。  
**[11:13] Speaker A:** But I think to me the big, I guess like the big mystery is, the favorite example for a while was how many letters are in a strawberry, and the models would famously get this wrong, and it's an example of jaggedness.  
但我觉得对我来说最大的，我想说最大的谜团是，有一阵子最喜欢举的例子是strawberry这个词里有多少个字母，模型会出名地答错这个问题，这是参差不齐的一个例子。  
**[11:27] Speaker A:** The models now patch this I think, but the new one is, I want to go to a car wash to wash my car and it's 50 meters away, should I  
现在的模型我想已经修补了这个问题，但新的例子是，我想去洗车，洗车店离我50米远，我应该  
**[11:34] Speaker A:** Drive or should I walk? And state-of-the-art models today will tell you to walk because it's so close.  
开车去还是走路去？而今天最先进的模型会告诉你走路去，因为太近了。  
**[11:40] Speaker A:** How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000 line codebase or find zero-day vulnerabilities and yet tells me to walk to this car wash?  
怎么可能最先进的 Opus 4.7 一边能重构 10 万行代码库或者发现零日漏洞,一边却让我走路去这个洗车店?  
**[11:52] Speaker A:** This is insane. And to whatever extent these models remain jagged, it's an indication that number one, maybe something's slightly off, or number two, you need to actually be in the loop a little bit and you need to treat them as tools and you do have to kind of stay in touch with what they're doing.  
这太离谱了。这些模型在多大程度上仍然表现得参差不齐,就说明:第一,可能有些地方不太对劲;第二,你确实需要参与进来一点,需要把它们当作工具来对待,而且你必须对它们正在做的事情保持关注。  
**[12:11] Speaker A:** And so I think all of my writing, long story short, about verifiability is just trying to understand why these things are jagged. Is there any pattern to it?  
所以长话短说,我关于可验证性的所有写作,其实就是在试图理解为什么这些模型会表现得参差不齐。这里面有什么规律吗?  
**[12:20] Speaker A:** And I think it's some kind of a combination of verifiable plus labs care. Maybe one more anecdote that is instructive is from GPT-3.5 to GPT-4, people noticed that  
我认为这是可验证性加上实验室重视程度的某种组合。还有一个很有启发性的例子,就是从 GPT-3.5 到 GPT-4,人们注意到  
**[12:31] Speaker A:** Chess improved a lot and I think a lot of people thought, oh well, it's just a progression of the capabilities, but actually it's more that—I think this is public information, I think I saw it on the internet—a huge amount of chess data made it into the pre-training set, and just because it's in a data distribution, basically the model improved a lot more than it would just by default.  
国际象棋能力提升了很多,我想很多人以为,哦这只是能力的自然进步,但实际上更多是因为——我觉得这是公开信息,我在网上看到的——大量国际象棋数据被加入了预训练集,仅仅因为它在数据分布里,模型的提升就比默认情况下大得多。  
**[12:50] Speaker A:** So someone at OpenAI decided to add this data and now you have a capability that just peaked a lot more.  
所以是 OpenAI 的某个人决定添加这些数据,然后你就有了一个突然大幅提升的能力。  
**[12:56] Speaker A:** And so that's why I think I'm stressing this dimension of it, as we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix.  
所以这就是为什么我强调这个维度,因为我们在某种程度上受制于实验室正在做的事情,受制于他们碰巧放进去的东西。  
**[13:04] Speaker A:** And you have to actually explore this thing that they give you that has no manual.  
而你必须真正去探索他们给你的这个没有说明书的东西。  
**[13:08] Speaker A:** And it works in certain settings, but maybe not in some settings.  
它在某些场景下有效,但在另一些场景下可能不行。  
**[13:11] Speaker A:** And you have to kind of explore it a little bit.  
你必须稍微探索一下。  
**[13:13] Speaker A:** And if you're in the circuits that were part of the RL, you fly. And if you're in the—  
如果你在那些属于强化学习一部分的回路里,你就能飞起来。但如果你在——  
**[13:19] Speaker A:** Circuits that are out of the data distribution, you're going to struggle and you have to kind of figure out which circuits you're in in your application. And if you're not in the circuits, then you have to really look at fine-tuning and doing some of your own work because it's not going to necessarily come out of the LLM out of the box.  
那些不在数据分布内的回路里,你就会很吃力,你必须搞清楚你的应用在哪些回路里。如果你不在这些回路里,那你就得认真考虑微调和做一些自己的工作,因为它不一定能直接从大语言模型开箱即用地得到。  
**[13:36] Speaker B:** I'd love to come back to the concept of jagged intelligence in a little bit. If you are a founder today and thinking about building a company, you are trying to solve a problem that you think is tractable, something that is a domain that is verifiable, but you look around and you think, "Oh my gosh, well, the labs have really started getting to escape velocity in the ones that seem most obvious, math, coding, and others." What would your advice be to the founders in the audience?  
我很想稍后再回到参差不齐的智能这个概念。如果你今天是一位创始人,正在考虑创建一家公司,你试图解决一个你认为可行的问题,一个可验证的领域,但你环顾四周会想:「天哪,实验室在那些看起来最明显的领域——数学、编程等等——真的开始达到逃逸速度了。」你会给在座的创始人什么建议?  
**[14:08] Speaker A:** So I think maybe that comes to the  
我想这可能回到了  
**[14:10] Speaker A:** Previous question of, I do think that verifiability, because it, um, let me think.  
之前的问题,我确实认为可验证性,因为它,嗯,让我想想。  
**[14:14] Speaker A:** So verifiability makes something tractable in the current paradigm because you can throw a huge amount of RL at it.  
可验证性使得某件事在当前范式下变得可行,因为你可以对它投入大量强化学习。  
**[14:20] Speaker A:** Um, so maybe one way to see it is that, uh, that remains true even if the labs are not focusing on it directly.  
所以也许可以这样看,即使实验室没有直接关注它,这一点仍然成立。  
**[14:26] Speaker A:** So if you are in a verifiable setting where you could create these RL environments or examples, then that actually sets you up to potentially do your own fine-tuning and you might benefit from that.  
所以如果你处在一个可验证的环境中,可以创建这些强化学习环境或示例,那实际上就为你自己做微调做好了准备,你可能会从中受益。  
**[14:36] Speaker A:** But that is fundamentally technology that just works.  
但这从根本上说是一种确实有效的技术。  
**[14:38] Speaker A:** You can pull a lever if you have a huge amount of diverse datasets of RL environments, etc.  
如果你有大量多样化的强化学习环境数据集等等,你就可以拉动这个杠杆。  
**[14:41] Speaker A:** Uh, you can use your favorite fine-tuning framework and, um, and, uh, pull the lever and get something that actually, uh, works pretty well.  
你可以使用你喜欢的微调框架,然后拉动杠杆,得到一个实际上效果相当好的东西。  
**[14:49] Speaker A:** So, um, I don't know what the examples of this might be.  
所以,我不知道具体例子可能是什么。  
**[14:51] Speaker A:** Um, but I do think there are some very valuable, uh, reinforcement learning environments that people could think of that I think are...  
但我确实认为有一些非常有价值的强化学习环境,人们可以考虑,我认为它们...  
**[14:59] Speaker A:** Not part of the... Yeah, I don't want to give away the answer, but there is one domain that I think is very... Oh, okay. Sorry, I don't mean to vague post on the stage, but there are some examples of this.  
不属于...是的,我不想在台上说得太含糊,但确实有一些这样的例子。  
**[15:09] Speaker B:** On the flip side, what do you think still feels automatable only from a distance?  
反过来说,你认为什么东西现在看起来可以自动化,但实际上只是远看如此?  
**[15:14] Speaker A:** I do think that ultimately almost everything can be made verifiable to some extent, some things easier than others. Because even for things like writing or so on, you can imagine having a council of LLM judges and probably get something reasonable out of this kind of an approach.  
我确实认为,最终几乎所有事情都可以在某种程度上被验证,只是有些事情比其他事情更容易。因为即使是写作这类任务,你也可以想象让一组 LLM 评委来评判,通过这种方法可能就能得到相当合理的结果。  
**[15:33] Speaker A:** So it's more about what's easy or hard. So I do think that ultimately... yeah, I think...  
所以更多是关于什么容易、什么困难的问题。我确实认为最终……是的,我觉得……  
**[15:42] Speaker B:** Everything? [laughter]  
所有事情?(笑)  
**[15:43] Speaker A:** Everything is automatable.  
所有事情都可以自动化。  
**[15:45] Speaker B:** Amazing. Okay. So last year you coined the term vibe coding and today...  
太厉害了。好的。去年你创造了「vibe coding」这个词,而今天……  
**[15:49] Speaker A:** We're in a world that feels a little bit more serious, more agentic engineering. What do you think is the difference between the two and what would you actually call what we're in today?  
我们所处的世界感觉更严肃了一些,更偏向 agentic engineering(智能体工程)。你认为这两者之间有什么区别?你会如何称呼我们今天所处的阶段?  
**[15:57] Speaker B:** Uh, yeah. So I would say vibe coding is about raising the floor for everyone in terms of what they can do in software.  
嗯,是的。我会说 vibe coding 是关于提升每个人在软件开发方面的能力下限。  
**[16:03] Speaker B:** So the floor rises, everyone can vibe code anything and that's amazing, incredible.  
也就是说,下限提高了,每个人都可以用 vibe coding 做任何事情,这很棒,非常了不起。  
**[16:06] Speaker B:** But then I would say agentic engineering is about preserving the quality bar of what existed before in professional software.  
但我会说 agentic engineering 是关于保持专业软件开发中原有的质量标准。  
**[16:11] Speaker B:** So you're not allowed to introduce vulnerabilities due to vibe coding. You are, you're still responsible for your software just as before, but can you go faster?  
你不能因为 vibe coding 就引入安全漏洞。你仍然要像以前一样对你的软件负责,但问题是你能不能更快?  
**[16:22] Speaker B:** And spoiler is you can, but how do you, how do you do that properly?  
剧透一下,答案是可以,但你要如何正确地做到这一点?  
**[16:24] Speaker B:** And so to me agentic engineering, when I call it that, because I do think it's kind of like an engineering discipline.  
所以对我来说,我之所以称之为 agentic engineering,是因为我确实认为它是一种工程学科。  
**[16:29] Speaker B:** You have these agents which are these like spiky entities. They're a bit fallible, a little  
你有这些智能体,它们是这种有点棱角分明的实体。它们有点容易出错,有点……  
**[16:33] Speaker A:** A bit stochastic, but they are extremely powerful. How do you coordinate them to go faster without sacrificing your quality bar and doing that well and correctly is the realm of agentic engineering. So I kind of see them as different, like one is about maybe raising the floor and the other is about extrapolating. And what I'm seeing, I think, is there is a very high ceiling on agentic engineer capability. And you know, people used to talk about the 10x engineer previously. I think that this is magnified a lot more. 10x is not the speed up you gain. And I think it does seem to me like people who are very good at this peak a lot more than 10x from my perspective right now.  
有点随机性,但它们极其强大。如何协调它们来加快速度而不牺牲质量标准,并且做得好、做得正确,这就是 agentic engineering 的领域。所以我认为它们是不同的,一个是关于提升下限,另一个是关于向上延伸。而我看到的是,agentic engineer 的能力上限非常高。你知道,人们以前常说 10 倍工程师。我认为这个倍数被大大放大了。你获得的加速不止 10 倍。从我现在的角度来看,那些非常擅长这个的人的峰值能力远超 10 倍。  
**[17:18] Speaker B:** I really like that framing. One thing that when Sam Altman came to AIN last year, one memorable thing he said was that people of different generations use ChatGPT differently. So if you're in your  
我很喜欢这个框架。去年 Sam Altman 来 AIN 时说的一件令人印象深刻的事是,不同年代的人使用 ChatGPT 的方式不同。所以如果你……  
**[17:29] Speaker A:** In your 30s, you use it as a Google search replacement. But if you're in your teens, TikTok is your gateway to the internet.  
如果你三十多岁,你把它当作 Google 搜索的替代品。但如果你十几岁,TikTok 才是你通往互联网的入口。  
**[17:35] Speaker A:** What is the parallel here in coding today? If we were to watch two people code using OpenAI, Claude, Codex, one you'd consider mediocre at it and one you would consider fully AI native, how would you describe the difference?  
那么在今天的编程中,类似的情况是什么?如果我们观察两个人使用 OpenAI、Claude、Codex 编程,一个你认为水平一般,一个你认为完全是 AI 原生的,你会如何描述这种差异?  
**[17:51] Speaker B:** [clears throat]  
(清嗓子)  
**[17:51] Speaker B:** I mean, I think it's just trying to get the most out of the tools that are available, utilizing all of their features, investing into your own kind of setup.  
我的意思是,我认为就是尽可能充分利用可用的工具,使用它们的所有功能,投入到你自己的设置中。  
**[17:59] Speaker B:** So just like previously, all the engineers are used to basically getting the most out of the tools you use, whether it's Vim or VS Code, or now it's, you know, Claude Code or Codex or so on.  
就像以前一样,所有工程师习惯于充分利用你使用的工具,无论是 Vim 还是 VS Code,或者现在是 Claude Code 或 Codex 等等。  
**[18:09] Speaker B:** So just investing into your setup and utilizing a lot of the tools that are available to you. And I think it just kind of looks like that.  
所以就是投入到你的设置中,并充分利用你可用的各种工具。我认为大概就是这样。  
**[18:18] Speaker B:** I do think that maybe  
我确实认为也许……  
**[18:23] Speaker A:** A related thought is a lot of people are maybe hiring for this right, because they want to hire strong agentic engineers.  
一个相关的想法是,很多人可能正在为此招聘,因为他们想招聘强大的 agentic engineer。  
**[18:31] Speaker A:** I do think that what I'm seeing is that most people have still not refactored their hiring process for agentic engineer capability, right? Like if you're giving out puzzles to solve, this is still the old paradigm.  
我确实认为,我看到的是,大多数人仍然没有针对 agentic engineer 能力重构他们的招聘流程,对吧?如果你还在出谜题让人解决,这仍然是旧范式。  
**[18:46] Speaker A:** I would say that hiring has to look like: give me a really big project and see someone implement that big project. Like let's write, say, a Twitter clone for agents, and then make it really good, make it really secure, and then have some agents simulate some activity on this Twitter.  
我会说招聘应该是这样的:给我一个真正大的项目,看某人实现那个大项目。比如说,写一个面向智能体的 Twitter 克隆,然后把它做得非常好,非常安全,然后让一些智能体在这个 Twitter 上模拟一些活动。  
**[19:03] Speaker A:** And then I'm going to use 10 Claude 3.5 Sonnet or X AI to try to break this website that you deployed, and they're going to try to basically break it, and they should not be able to break it.  
然后我会用 10 个 Claude 3.5 Sonnet 或者 X AI 来尝试攻破你部署的这个网站,让它们尝试破解,但它们应该破解不了才对。  
**[19:16] Speaker A:** And so maybe it looks like that, right? And so yeah, watching people in that setting...  
可能就是这样的场景,对吧?所以在那种环境下观察人们的表现……  
**[19:21] Speaker A:** Building bigger projects and utilizing the tooling is maybe what I would look at for the most part.  
构建更大的项目并利用这些工具,这可能是我主要会关注的方面。  
**[19:29] Speaker B:** And as agents do more, what human skill do you think becomes more valuable, not less?  
随着 agent 能做的事情越来越多,你认为哪些人类技能会变得更有价值,而不是更不重要?  
**[19:34] Speaker A:** So yeah, it's a good question. I think, well, right now the answer is that the agents are kind of like these intern entities, right? So it's remarkable. You basically still have to be in charge of the aesthetics, the judgment, the taste, and a little bit of oversight. Maybe one of my favorite examples of like the weirdness of agents is, for menu gen, you sign up with a Google account, but you purchase credits using a Stripe account, and both of them have email addresses. And my agent actually tried to basically, like when you purchase credits, it assigned it using the email address from Stripe to the Google email address, like...  
这是个好问题。我觉得,目前的答案是这些 agent 有点像实习生一样的存在,对吧?很神奇的是,你基本上还是要负责审美、判断、品味,以及一些监督工作。我最喜欢的一个例子,能体现 agent 的怪异之处,就是在 menu gen 中,你用 Google 账号注册,但用 Stripe 账号购买积分,两者都有邮箱地址。我的 agent 实际上尝试在你购买积分时,用 Stripe 的邮箱地址去匹配 Google 的邮箱地址,就像……  
**[20:15] Speaker A:** There wasn't a persistent user ID for people. It was trying to match up the email addresses, but you could use different email addresses for your Stripe and your Google and basically would not associate the funds.  
用户没有持久化的用户 ID。它试图通过邮箱地址来匹配,但你的 Stripe 和 Google 可以用不同的邮箱,这样就无法关联资金了。  
**[20:26] Speaker A:** And so this is the kind of thing that these agents still will make mistakes about, is like why would you use email addresses to try to cross-correlate the funds? They can be arbitrary. You can use different emails, etc. Like this is such a weird thing to do.  
所以这就是这些 agent 仍然会犯的错误,比如为什么要用邮箱地址来交叉关联资金?邮箱是可以任意设置的,你可以用不同的邮箱等等。这种做法真的很奇怪。  
**[20:36] Speaker A:** So I think people have to be in charge of this spec, this plan.  
所以我认为人类必须负责这个规格说明、这个计划。  
**[20:40] Speaker A:** And I actually don't even like the plan mode. I would—I mean obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed and maybe it's basically the docs, and then get the agents to write them and you're in charge of the oversight and the top level categories, but the agents are—  
其实我甚至不太喜欢计划模式。我的意思是,它显然很有用,但我觉得这里有更通用的东西,就是你必须和你的 agent 一起设计一个非常详细的规格说明,可能基本上就是文档,然后让 agent 去编写它们,你负责监督和顶层分类,但 agent 在做——  
**[21:00] Speaker A:** Doing a lot of the under the hood, and so I think you're not caring about some of the details.  
很多底层的工作,所以你不用关心某些细节。  
**[21:04] Speaker A:** So as an example, also with arrays or tensors in neural networks, there's a ton of details between PyTorch and NumPy and all the different like pandas and so on for all the different little API details.  
举个例子,在神经网络中处理数组或张量时,PyTorch 和 NumPy 以及 pandas 等等之间有大量的 API 细节差异。  
**[21:17] Speaker A:** And I already forgot about the keep dims versus keep dim or whether it's dim or axis or reshape or permute or transpose.  
我已经忘了是 keep dims 还是 keep dim,或者是 dim 还是 axis,是 reshape 还是 permute 还是 transpose。  
**[21:22] Speaker A:** I don't remember this stuff anymore, right?  
我已经不记得这些东西了,对吧?  
**[21:24] Speaker A:** Because you don't have to. This is the kind of details that are handled by the intern because they have very good recall. But you still have to know, for example, that there's an underlying tensor, there's an underlying view, and then you can manipulate view of the same storage or you can have different storage which would be less efficient. And so you still have to have an understanding of what this stuff is doing and some of the fundamentals so that you're not  
因为你不需要记。这些细节由实习生处理,因为它们的记忆力很好。但你仍然需要知道,比如底层有一个张量,有一个底层视图,然后你可以操作同一存储的视图,或者你可以有不同的存储,那样效率会更低。所以你仍然需要理解这些东西在做什么,以及一些基本原理,这样你就不会——  
**[21:45] Speaker A:** Copying memory around unnecessarily and so on, but the details of the APIs are now handed off, so you're in charge of the taste, the engineering, the design, and that it makes sense and that you're asking for the right things and that you're saying that, okay, these have to be unique user IDs that we're going to tie everything to. And so you're doing some of the design and development and the engineers are doing the fill in the blanks, and that's currently kind of like where we are, and I think that's what everyone of course is seeing, I think, right now.  
不必要地复制内存等等,但 API 的细节现在交给它们了,所以你负责品味、工程设计、整体设计,确保它有意义,确保你要求的东西是对的,确保你说的是,好的,这些必须是唯一的用户 ID,我们要把所有东西都绑定到它上面。所以你在做一些设计和开发工作,而工程师在填空,这就是我们目前的状态,我想这也是大家现在都看到的。  
**[22:13] Speaker B:** Do you think there's a chance that this taste and judgment matters less over time, or will the ceiling just keep rising?  
你觉得随着时间推移,这种品味和判断会变得不那么重要吗,还是说天花板会一直上升?  
**[22:21] Speaker A:** Yeah, it's a good question. I would—okay, I mean, I'm hoping that it improves. I think probably the reason it doesn't improve right now is, again, it's not part of the RL. There's probably no  
这是个好问题。我希望它能改进。我觉得它现在没有改进的原因可能是,它不是强化学习的一部分。可能没有——  
**[22:33] Speaker A:** Aesthetics cost or reward, or it's not good enough or something like that.  
审美成本或奖励,或者还不够好,诸如此类。  
**[22:39] Speaker A:** I do think that when you actually look at the code, sometimes I get a little bit of a heart attack because it's not like super amazing code necessarily all the time, and it's very bloated and there's a lot of copy-paste and there's awkward abstractions that are brittle, and like it works but it's just really gross.  
我确实觉得当你真正看代码时,有时候会有点心惊肉跳,因为代码不一定总是特别优秀,而且非常臃肿,有很多复制粘贴,有些抽象很脆弱,虽然能用但真的很糟糕。  
**[22:52] Speaker A:** And I do hope that this can improve in future models.  
我确实希望未来的模型能在这方面有所改进。  
**[22:55] Speaker A:** A good example also is this, you know, MicroGPT project where I was trying to simplify LLM training to be as simple as possible.  
一个很好的例子是这个 MicroGPT 项目,我试图把 LLM 训练简化到尽可能简单。  
**[23:04] Speaker A:** The models hate this. They can't do it.  
模型很讨厌这个。它们做不到。  
**[23:06] Speaker A:** I kept trying to prompt an LLM to simplify more, simplify more, and it just can't—you feel like you're outside of the RL circuits.  
我一直试图提示 LLM 再简化一点,再简化一点,但它就是做不到——你会感觉你在强化学习回路之外。  
**[23:15] Speaker A:** It feels like you're obviously, you know, pulling teeth. It's not like light speed.  
感觉就像在拔牙一样。不像光速那样快。  
**[23:20] Speaker A:** So I think, I do think that people still remain in charge of this.  
所以我确实认为,人类仍然要负责这些事情。  
**[23:25] Speaker A:** But I do think that there's  
但我确实认为  
**[23:26] Speaker A:** Nothing fundamental again that's preventing it, it's just the labs haven't done it yet almost.  
从根本上说,没有什么东西在阻止它实现,只是实验室还没有做到而已。  
**[23:30] Speaker B:** Yeah.  
是的。  
**[23:31] Speaker A:** So I'd love to come back to this idea of jagged forms of intelligence. You wrote a little bit about this with a very thought-provoking piece around animals versus ghosts.  
那我想回到「参差不齐的智能形态」这个概念。你写过一篇很有启发性的文章,讨论动物与幽灵的对比。  
**[23:39] Speaker A:** And the idea is that we're not building animals, we are summoning ghosts.  
核心观点是:我们不是在构建动物,而是在召唤幽灵。  
**[23:46] Speaker A:** And these are jagged forms of intelligence that are shaped by data and reward functions, but not by intrinsic motivation or fun or curiosity or empowerment.  
这些智能形态是参差不齐的,它们由数据和奖励函数塑造,但不具备内在动机、乐趣、好奇心或自主性。  
**[23:54] Speaker A:** Things that kind of came about via evolution.  
这些特质是通过进化产生的。  
**[24:00] Speaker A:** Why does that framing matter and what does it actually change about how you build and deploy and evaluate or even trust them?  
为什么这种框架很重要?它实际上如何改变你构建、部署、评估甚至信任 AI 的方式?  
**[24:08] Speaker B:** Yeah, so I think the reason I wrote about this is because I'm trying to wrap my head around what these things are, right?  
我写这篇文章是因为我在努力理解这些东西到底是什么。  
**[24:15] Speaker B:** Because if you have a good model of what they are or are not, then  
如果你对它们是什么、不是什么有一个清晰的认知模型,那么  
**[24:18] Speaker A:** You're going to be more competent at using them, and I do think that I'm not sure if it actually has like real power. [laughter]  
你使用它们时会更得心应手。不过我不确定这个框架是否真的有实际效力。(笑)  
**[24:28] Speaker A:** I think it's a little bit of philosophizing, but I do think that it's just coming to terms with the fact that these things are not, you know, animal intelligences.  
我觉得这有点哲学化,但确实是在接受一个事实:这些东西不是动物智能。  
**[24:38] Speaker A:** Like if you yell at them, they're not going to work better or worse or it doesn't have any impact. And it's all just kind of like these statistical simulation circuits where the substrate is pre-training, so like statistics, and then but then there's RL bolted on top.  
比如你对它们大喊大叫,它们不会表现得更好或更差,完全没有影响。它们本质上是统计模拟电路,基础是预训练——也就是统计学,然后在上面加了强化学习。  
**[24:55] Speaker A:** So it kind of like increases the dependencies, and maybe it's just kind of like a mindset of what I'm coming into or what's likely to work or not likely to work or how to modify it.  
所以这增加了依赖关系的复杂度。也许这只是一种思维方式,关于我如何看待它、什么可能有效、什么可能无效、以及如何调整它。  
**[25:05] Speaker A:** But I don't actually—I don't know that I have like here are the five obvious outcomes of how to make your  
但我其实没有——我没有那种「这里有五个明显的方法来改进你的  
**[25:11] Speaker A:** System better, it's more just being suspicious of it and  
系统」的结论,更多是保持怀疑态度,然后  
**[25:14] Speaker B:** Figuring out over time.  
随着时间慢慢摸索。  
**[25:16] Speaker B:** That's where it starts. Okay, so you are so deep in working with agents that don't just chat. They have real permissions, they have local context, they actually take action on your behalf. What does the world look like when we all start to live in that world?  
这就是起点。好的,你深度参与开发的 agent 不只是聊天,它们有真实的权限、本地上下文,能代表你采取实际行动。当我们都开始生活在那个世界里,会是什么样子?  
**[25:31] Speaker A:** Yeah, I think a lot of people probably here are excited about what this agent native agentic environment looks like and everything has to be rewritten. Everything is still fundamentally written for humans and has to be moved around. I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans. This is my favorite pet peeve. Like, why are people still telling me what to do?  
我想这里很多人都对 agent 原生环境感到兴奋,认为一切都需要重写。现在一切仍然是为人类设计的,需要人来操作。我使用各种框架或库时,它们的文档仍然是为人类写的。这是我最喜欢吐槽的点:为什么人们还在告诉我该做什么?  
**[25:57] Speaker A:** I don't want to do anything. What is the thing I should copy paste to my agent?  
我什么都不想做。我需要的是可以直接复制粘贴给 agent 的东西。  
**[26:00] Speaker A:** [laughter] Like, so it's just every time I'm told, you know, go to this URL or something like that, it's just like ah [laughter] you know. [snorts]  
(笑)每次被告知「去这个网址」之类的,我就会想「啊」(笑)(哼)  
**[26:07] Speaker A:** So everyone is I think excited about how do we decompose the workloads that need to happen into fundamentally sensors over the world, actuators over the world.  
所以大家都在思考如何将需要完成的工作分解为对世界的传感器和执行器。  
**[26:16] Speaker A:** How do we make it agent native? Basically describe it to agents first, and then have a lot of automation around, you know, data structures that are very legible to the LLMs.  
如何让它 agent 原生?基本上就是先为 agent 描述,然后围绕对大语言模型高度可读的数据结构做大量自动化。  
**[26:30] Speaker A:** So I think, yeah, I'm hoping that there's a lot of agent first infrastructure out there and that, you know, for Menuguen famously when I wrote the—not I'm not sure how famously but when I wrote the blog post about Menuguen [laughter]  
所以我希望会有很多 agent 优先的基础设施出现。说到 Menuguen,当我写那篇——不确定有多出名,但当我写关于 Menuguen 的博客文章时(笑)  
**[26:44] Speaker A:** A lot of the work, a lot of the trouble was not even writing the code for Menuguen, it was deploying it in  
大部分工作、大部分麻烦甚至不是编写 Menuguen 的代码,而是部署它  
**[26:48] Speaker A:** Vercel, because I had to work with all these different services and I had to string them up and I had to go to their settings and the menus and you know configure my DNS and it was just so annoying. And so that's a good example of I would hope that MenuGen that I could give a prompt to an LLM, build MenuGen, and then I didn't have to touch anything and it's deployed in that same way on the internet.  
以 Vercel 为例,我当时需要对接各种不同的服务,把它们串联起来,还得进到各自的设置和菜单里配置 DNS,整个过程非常烦人。所以我希望像 MenuGen 这样的项目,我只需要给 LLM 一个提示词,让它构建出 MenuGen,然后我什么都不用管,它就能以同样的方式部署到互联网上。  
**[27:07] Speaker A:** I think that would be a good kind of a test for whether or not a lot of our infrastructure is becoming more and more agent native.  
我觉得这可以作为一个很好的测试标准,来判断我们的基础设施是否正在变得越来越适合 agent 原生使用。  
**[27:14] Speaker A:** And then ultimately I would say yeah, I do think we're going towards a world where there's agent representation for people and for organizations and you know I'll have my agent talk to your agent to figure out some of the details of our meetings or things like that.  
最终我认为,我们确实在走向这样一个世界:人和组织都会有 agent 代表,比如我的 agent 会和你的 agent 对话,来敲定我们会议的一些细节之类的事情。  
**[27:30] Speaker A:** So [laughter], I do think that that's roughly where things are going, but yeah, I think everyone here is excited about  
所以(笑),我确实认为事情大致在朝这个方向发展,而且我觉得在座的每个人都对此感到兴奋。  
**[27:37] Speaker A:** That.  
对此感到兴奋。  
**[27:38] Speaker B:** I really like the visual analogy of sensors and actuators. I actually hadn't thought of that. That's super interesting.  
我真的很喜欢传感器和执行器这个视觉类比。我之前还真没这么想过,这个角度超级有意思。  
**[27:43] Speaker A:** Right?  
对吧?  
**[27:43] Speaker B:** Okay, I think we have to end on a question about education because you are probably one of the very best in the world at making complex technical concepts simple and deeply thoughtful about how we design education around it.  
好,我想我们得用一个关于教育的问题来结束今天的对话,因为你可能是世界上最擅长把复杂技术概念讲简单的人之一,而且你对如何围绕这些概念设计教育有非常深刻的思考。  
**[27:56] Speaker B:** What still remains worth learning deeply when intelligence gets cheap as we move into the next era of AI?  
当智能变得廉价,当我们进入 AI 的下一个时代,什么东西仍然值得深入学习?  
**[28:05] Speaker A:** Yeah, there was a tweet that blew my mind recently and I keep thinking about it like every other day. It was something along the lines of, you can outsource your thinking but you can't outsource your understanding.  
有条推文最近让我震撼了,我每隔一天就会想起它。大意是:你可以外包你的思考,但你无法外包你的理解。  
**[28:17] Speaker B:** I think that's really nicely put. Yeah, because I'm still part of the system and I still have to  
我觉得这个表述非常精准。因为我仍然是系统的一部分,我仍然需要——  
**[28:25] Speaker A:** Somehow information still has to make it into my brain, and I feel like I'm becoming a bottleneck of just even knowing what are we trying to build, why is it worth doing, how do I direct, you know, how do I direct my agents and so on.  
信息仍然需要以某种方式进入我的大脑,我感觉自己正在成为一个瓶颈,甚至只是知道我们要构建什么、为什么值得做、我该如何指导——如何指导我的 agent 等等,这些都成了瓶颈。  
**[28:34] Speaker A:** So I do still think that ultimately something has to direct the thinking and the processing and so on, and that's still kind of fundamentally constrained somehow by understanding.  
所以我确实认为,最终必须有什么东西来指导思考和处理过程,而这在某种程度上仍然从根本上受到理解能力的制约。  
**[28:46] Speaker A:** And this is one reason I also was very excited about all the LLM knowledge bases, because I feel like that's a way for me to process information, and anytime I see a different projection onto information, I always feel like I gain insight.  
这也是我对所有 LLM 知识库感到非常兴奋的一个原因,因为我觉得那是我处理信息的一种方式,而且每当我看到信息的不同投影角度时,我总觉得自己获得了洞察。  
**[28:56] Speaker A:** So it's really just a lot of prompts for me to do synthetic data generation kind of over some fixed data. So I really enjoy whenever I read an article, I have my wiki that's being built up from these articles, and I love asking questions about things, and I think that  
所以对我来说,这其实就是在某些固定数据上进行合成数据生成的大量提示词。我真的很享受每次读完一篇文章后,我的 wiki 就从这些文章中构建起来,我喜欢提出各种问题,我认为——  
**[29:12] Speaker A:** Ultimately these are tools to enhance understanding in a certain way, and this is still kind of like a bit of a bottleneck because then you can't direct the—you can't be a good director if you still—because the LLMs certainly don't excel at understanding, you still are uniquely in charge of that.  
最终这些都是在某种程度上增强理解的工具,而这仍然是一个瓶颈,因为如果你不能很好地理解,你就无法成为一个好的指导者——因为 LLM 在理解方面显然并不擅长,你仍然是唯一负责这件事的人。  
**[29:28] Speaker A:** So yeah, I think tools to that effect are incredibly interesting and exciting.  
所以我认为朝这个方向发展的工具非常有趣和令人兴奋。  
**[29:33] Speaker B:** I'm excited to be back here in a couple years and to see if we've been fully automated out of the loop and they actually take care of understanding as well.  
我很期待几年后再回到这里,看看我们是否已经被完全自动化排除在外,它们是否真的也能处理理解这件事了。  
**[29:40] Speaker B:** Thank you so much for joining us, Andrej. We really appreciate it.  
非常感谢你加入我们,Andrej。我们真的很感激。  
**[29:42] Speaker A:** [applause]  
(掌声)  

---

## Deep Dive Summary

### Topic 1: Introduction and Andrej's background
Andrej Karpathy 的背景介绍
_[00:02]_

**Q:** Who is the guest and what is his background in AI?
**问：** 嘉宾是谁，他在 AI 领域有什么背景？

**A:** Andrej Karpathy is introduced as a foundational figure in modern AI who co-founded OpenAI and made Tesla's Autopilot operational. The host emphasizes his dual contribution of both building and explaining AI systems, noting his "rare gift of making the most complex technical shifts feel both accessible and inevitable." Most strikingly, despite coining "vibe coding" and his extensive expertise, Andrej recently admitted he's "never felt more behind as a programmer," which frames the conversation's focus on the rapidly accelerating pace of AI development.
**答：** Andrej Karpathy 是现代 AI 领域的核心人物，他联合创立了 OpenAI，并让 Tesla 的 Autopilot 真正运转起来。主持人强调他既能构建 AI 系统又能深入浅出地解释技术，具有让复杂技术变革显得"accessible and inevitable"的罕见能力。最引人注目的是，尽管他创造了"vibe coding"这个术语并拥有深厚的专业背景，Andrej 最近坦言自己"never felt more behind as a programmer"，这为对话设定了主题：AI 发展速度之快已经超出了顶尖专家的跟进能力。

### Topic 2: Feeling behind as a programmer
程序员的落后感
_[00:47]_

**Q:** Why does Andrej feel more behind as a programmer than ever before?
**问：** 为什么 Andrej 作为程序员感到前所未有的落后？

**A:** Andrej experienced a stark transition in December when AI coding tools crossed a threshold where generated code chunks "just came out fine" consistently, eliminating the need for corrections and enabling what he calls "vibe coding." This shift was both exhilarating and unsettling because it fundamentally changed his relationship with programming—he went from editing AI outputs to trusting the system completely and spawning "infinite side projects." He emphasizes this wasn't gradual improvement but a clear inflection point with the latest models, particularly in "agentic coherent workflow," that many people missed if they only experienced AI through ChatGPT in early 2023.
**答：** Andrej 在 12 月经历了一个明显的转折点，AI 编码工具生成的代码 "just came out fine"，不再需要修正，让他进入了 "vibe coding" 的状态。这种转变既令人兴奋又让人不安，因为它从根本上改变了他与编程的关系——从修改 AI 输出到完全信任系统，并开启了 "infinite side projects"。他强调这不是渐进式改进，而是最新模型在 "agentic coherent workflow" 上的明确拐点，许多只在 2023 年初体验过 ChatGPT 的人可能错过了这个变化。

### Topic 3: Software 3.0 paradigm and LLMs as computers
Software 3.0 范式：LLM 作为可编程计算机
_[02:28]_

**Q:** What is Software 3.0 and how are LLMs a new computing paradigm?
**问：** 什么是 Software 3.0，LLM 如何成为新的计算范式？

**A:** Software 3.0 represents a fundamental shift where "prompting" and "what's in the context window" become the programming interface, with LLMs acting as interpreters that perform computation in digital information space. Unlike Software 1.0's explicit rules or Software 2.0's learned weights through dataset curation, Software 3.0 emerges because LLMs trained on sufficiently large task sets become "a programmable computer in a certain sense." The OpenClaw installation example illustrates this paradigm shift: instead of writing complex shell scripts to handle different platforms, the installation is simply "a little script of, you know, copy-paste this and give it to your agent"—the agent uses its own intelligence to examine the environment, debug issues, and adapt to different systems. This approach is "so much more powerful" because developers no longer need to "precisely spell out all the individual details," instead delegating intelligent execution to the LLM while focusing on high-level instructions.
**答：** Software 3.0 是一个根本性转变，编程方式变成了 "prompting" 和操控 "context window"，LLM 充当解释器在数字信息空间执行计算。不同于 Software 1.0 的显式规则或 Software 2.0 通过数据集训练神经网络，Software 3.0 的出现是因为在足够大的任务集上训练的 LLM 本身成为了 "可编程的计算机"。OpenClaw 安装案例很好地说明了这一点：不再需要编写复杂的 shell 脚本来处理不同平台，安装过程就是 "copy-paste 一段文本给 agent"——agent 利用自身智能检查环境、调试问题并适配不同系统。这种方式 "强大得多"，因为开发者不必 "精确地列出所有细节"，而是将智能执行委托给 LLM，自己专注于高层指令。

### Topic 4: Menu Gen example and neural network-first approach
Menu Gen 案例：神经网络优先的范式转变
_[04:48]_

**Q:** How does the Menu Gen project illustrate the shift from traditional apps to neural network-based solutions?
**问：** Menu Gen 项目如何体现从传统应用到神经网络方案的转变？

**A:** The speaker built Menu Gen as a traditional app with OCR, image generation, and re-rendering logic on Vercel, only to realize the entire architecture was "spurious" when a Software 3.0 approach emerged: simply giving Gemini the photo with a prompt to "overlay the things onto the menu" using image generation directly produced the desired output. This shift represents a fundamental reframing where "your neural network is doing more and more of the work" with just image input and image output, eliminating the need for intermediate app logic. The broader implication extends beyond faster programming to "new things that weren't possible before," like his LLM knowledge base project that recompiles and reframes documents in ways that "no code" could achieve previously, moving from structured data processing to general information processing.
**答：** 讲者最初用传统方式构建了 Menu Gen，包括 OCR、图像生成和在 Vercel 上重新渲染菜单，但后来发现整个架构都是"多余的"——Software 3.0 的做法是直接把照片给 Gemini，让它用图像生成"把菜品叠加到菜单上"就能得到想要的结果。这种转变的核心是"神经网络承担了越来越多的工作"，只需要图像输入和图像输出，中间的应用逻辑完全不需要了。更深层的意义不只是让编程变快，而是实现了"以前不可能的新事物"，比如他的 LLM 知识库项目能够重新编译和重构文档，这是"任何代码"以前都做不到的——从结构化数据处理进化到通用信息处理。

### Topic 5: Future of computing: neural nets as primary processors
计算的未来：神经网络作为主处理器
_[07:37]_

**Q:** What will the 2026 equivalent of building websites or mobile apps look like?
**问：** 2026 年相当于构建网站或移动应用的工作会是什么样子？

**A:** Speaker A envisions a fundamental architectural inversion where "the neural net becomes kind of like the host process and the CPUs become kind of like the co-processor," reversing today's paradigm where neural nets run virtualized on classical computers. This shift echoes an unresolved debate from the 1950s-60s about whether computers would resemble "calculators or neural nets," with the industry having chosen the calculator path that led to classical computing. The speaker imagines devices that process "raw videos or audio" through neural networks and use "diffusion to render a UI" uniquely for each moment, with deterministic CPUs relegated to "tool use as this historical appendage." While acknowledging this extrapolation "looks very weird" and the progression remains "TBD," the core thesis is that intelligence compute will become the dominant spend of flops, fundamentally restructuring how we build software.
**答：** Speaker A 预见了一个架构上的根本性反转：神经网络会成为主处理器，而 CPU 会变成协处理器，这与今天神经网络在传统计算机上虚拟化运行的模式完全相反。这种转变呼应了 1950-60 年代一个未解决的争论——计算机应该像计算器还是像神经网络，而当时行业选择了计算器路径，发展出了经典计算。他设想未来的设备直接处理原始视频或音频，通过神经网络和 diffusion 为每个时刻渲染独特的 UI，而确定性的 CPU 只是作为「历史遗留的附属工具」来处理某些任务。虽然他承认这个推演「看起来很怪异」且发展路径「有待确定」，但核心观点是智能计算会成为算力开销的主导，从根本上重构软件构建方式。

### Topic 6: Verifiability and AI automation
可验证性与 AI 自动化
_[09:41]_

**Q:** Why does AI automate verifiable domains faster and what explains the jaggedness in model capabilities?
**问：** 为什么 AI 更快地自动化可验证领域，模型能力的参差不齐如何解释？

**A:** LLMs automate verifiable domains faster because they're trained as "giant reinforcement learning environments" that receive verification rewards, causing them to peak in capability where outputs can be verified like math and code. The jaggedness—where models can "refactor a 100,000 line codebase" yet fail at simple reasoning like whether to walk to a nearby car wash—stems from two factors: which domains labs prioritize in training data and which tasks are economically valuable enough to warrant focused development. This pattern suggests users must "stay in the loop" and treat models as tools rather than autonomous agents, as capabilities don't progress uniformly but rather concentrate where verification mechanisms exist and labs invest resources, exemplified by how chess ability jumped dramatically from GPT-3.5 to GPT-4 primarily because "a huge amount of chess data made it into the pre-training set."
**答：** LLM 之所以能更快地自动化可验证领域，是因为它们本质上是通过强化学习训练的，在可验证的任务上会获得奖励反馈，因此在数学、代码等可验证领域表现突出。模型能力的参差不齐——既能处理十万行代码重构，又会在简单的常识推理上出错——主要由两个因素决定：实验室在训练数据中优先考虑哪些领域，以及哪些任务具有足够的经济价值。这种模式表明用户需要保持参与，将模型视为工具而非完全自主的系统。能力提升并非均匀分布，而是集中在有验证机制且实验室投入资源的领域，GPT-3.5 到 GPT-4 的国际象棋能力大幅提升就是典型例子，主要是因为训练集中加入了大量象棋数据。

### Topic 7: Dependency on lab decisions and exploring capabilities without manuals
开发者受制于 AI 实验室的决策与无手册探索
_[12:50]_

**Q:** How are developers at the mercy of what AI labs include in their models, and why is exploration necessary?
**问：** 开发者如何受制于 AI 实验室在模型中包含的内容，为什么探索是必要的？

**A:** Developers are fundamentally dependent on AI labs' training decisions, as capabilities emerge based on "whatever they happen to put into the mix" during model development. The speaker emphasizes that LLMs arrive "with no manual," requiring developers to empirically explore which use cases fall within the model's reinforcement learning circuits versus outside its data distribution. Applications that align with the RL-trained circuits will perform well out of the box, but those outside this distribution will struggle and require fine-tuning or custom work to achieve reliable results.
**答：** 开发者本质上依赖于 AI 实验室的训练决策，模型能力取决于实验室在训练中投入的数据和方法。Speaker 强调 LLM 是一个 "no manual" 的黑盒系统，开发者必须通过实验来探索自己的应用场景是否落在模型的 RL 训练回路内。如果应用场景恰好在训练分布内，模型会表现出色；但如果超出数据分布范围，就需要通过 fine-tuning 等方式进行定制开发。

### Topic 8: Advice for founders: verifiability and fine-tuning opportunities
给创始人的建议：可验证性与微调机会
_[13:36]_

**Q:** What should founders focus on when labs are achieving escape velocity in obvious domains?
**问：** 当大型实验室在明显领域快速突破时，创始人应该关注什么？

**A:** Speaker A argues that verifiability remains the key strategic advantage for founders even as major labs dominate obvious domains like math and coding, because "verifiability makes something tractable in the current paradigm" by enabling massive reinforcement learning applications. Founders should identify verifiable domains where they can create their own RL environments and datasets, then leverage existing fine-tuning frameworks to "pull a lever" and achieve strong results independently of lab focus. The speaker hints at unexplored "very valuable reinforcement learning environments" outside the mainstream but deliberately avoids revealing specific examples, suggesting untapped opportunities exist for founders who can identify the right verifiable domains.
**答：** Speaker A 认为，即使大型实验室在数学、编程等明显领域占据主导，可验证性仍然是创始人的关键战略优势，因为可验证性能够通过大规模 reinforcement learning 让问题在当前范式下变得可解。创始人应该寻找可验证的领域，在那里构建自己的 RL 环境和数据集，然后利用现有的微调框架"拉动杠杆"就能获得不错的效果。Speaker A 暗示存在一些主流之外"非常有价值的 reinforcement learning 环境"，但故意不透露具体例子，说明对于能识别正确可验证领域的创始人来说，仍有未开发的机会。

### Topic 9: What remains hard to automate and the path to automating everything
什么仍然难以自动化以及自动化一切的路径
_[15:09]_

**Q:** What tasks still feel automatable only from a distance, and is everything ultimately automatable?
**问：** 哪些任务从远处看似乎可以自动化，最终一切都可以自动化吗？

**A:** Speaker A argues that "ultimately almost everything can be made verifiable" and therefore automatable, though the distinction lies in what's "easy or hard" rather than possible versus impossible. They propose that even subjective domains like writing can be automated through mechanisms like "a council of LLM judges" to establish verification frameworks. The speaker ultimately commits to the bold position that "everything is automatable," framing automation as a question of difficulty and verification design rather than fundamental limits.
**答：** 嘉宾认为几乎所有事情最终都可以被验证，因此也就可以自动化，区别只在于难易程度而非可能性本身。即使是写作这类主观任务，也可以通过 "a council of LLM judges" 这样的机制来建立验证框架。嘉宾最终给出了一个大胆的结论：一切都可以自动化，这是验证设计和难度的问题，而不是根本性的限制。

### Topic 10: Vibe coding vs agentic engineering: raising the floor vs preserving quality
Vibe coding 与 agentic engineering：提升能力底线与保持专业质量
_[15:45]_

**Q:** What's the difference between vibe coding and agentic engineering, and what are the productivity gains?
**问：** Vibe coding 和 agentic engineering 有什么区别，生产力提升有多大？

**A:** Vibe coding democratizes software creation by "raising the floor" so anyone can build things, while agentic engineering focuses on "preserving the quality bar" of professional software while accelerating development. The speaker frames agentic engineering as a discipline of coordinating "spiky" and "fallible" AI agents without introducing vulnerabilities or sacrificing standards. Unlike the traditional 10x engineer concept, skilled practitioners of agentic engineering can achieve productivity gains far exceeding 10x, suggesting a "very high ceiling" for this capability.
**答：** Vibe coding 通过"raising the floor"让所有人都能开发软件，而 agentic engineering 则专注于在加速开发的同时"preserving the quality bar"，保持专业软件的质量标准。讲者将 agentic engineering 定义为一门协调"spiky"且"fallible"的 AI agents 的工程学科，关键是在不引入漏洞、不降低质量的前提下提速。与传统的 10x 工程师概念不同，擅长 agentic engineering 的人能获得远超 10 倍的生产力提升，这个能力上限"very high ceiling"非常高。

### Topic 11: Generational differences and AI-native coding practices
代际差异与 AI 原生编码实践
_[17:18]_

**Q:** How do mediocre versus AI-native developers use coding tools differently?
**问：** 普通开发者和 AI 原生开发者使用编码工具有什么区别？

**A:** The distinction between mediocre and AI-native developers mirrors how different generations use ChatGPT—older users treat it as search replacement while younger users integrate it as their internet gateway. AI-native developers approach coding tools by "getting the most out of the tools that are available" and "investing into your own kind of setup," similar to how engineers historically optimized their Vim or VS Code configurations. The key differentiator is not just using AI tools like Claude Code or Codex, but deeply customizing and "utilizing all of their features" to maximize productivity, treating AI coding assistants as core infrastructure rather than occasional helpers.
**答：** 普通开发者和 AI 原生开发者的区别，类似不同年龄段的人使用 ChatGPT 的方式——年长用户把它当搜索引擎，年轻用户则把它作为上网入口。AI 原生开发者会"充分利用可用工具"并"投入精力优化自己的开发环境"，就像工程师过去优化 Vim 或 VS Code 配置一样。关键区别不在于是否使用 Claude Code 或 Codex 这类 AI 工具，而在于深度定制并"利用它们的所有功能"来最大化生产力，把 AI 编码助手当作核心基础设施而非偶尔用用的辅助工具。

### Topic 12: Refactoring hiring for agentic engineering capability
为 agentic engineering 能力重构招聘流程
_[18:23]_

**Q:** How should companies change their hiring process to evaluate agentic engineering skills?
**问：** 公司应该如何改变招聘流程来评估 agentic engineering 技能？

**A:** Most companies have not yet adapted their hiring processes for agentic engineering, still relying on traditional "puzzles to solve" rather than evaluating real capability with AI tools. The speaker advocates for project-based assessments where candidates build substantial applications like "a Twitter clone for agents" that must be secure, functional, and resilient. The evaluation should include adversarial testing where AI models like "Claude 3.5 Sonnet or X AI" attempt to break the deployed system, with successful candidates demonstrating both the ability to build at scale and to defend against AI-powered attacks.
**答：** 大多数公司的招聘流程还没有为 agentic engineering 做出调整，仍在使用传统的算法题而非评估候选人使用 AI 工具的实际能力。理想的招聘应该是基于大型项目的评估，比如让候选人"构建一个给 agent 用的 Twitter 克隆"，要求功能完善且安全可靠。评估过程应包括对抗性测试，用 Claude 3.5 Sonnet 或 X AI 等模型尝试攻破候选人部署的系统，成功的候选人需要展示大规模构建能力和防御 AI 攻击的能力。

### Topic 13: Human skills that remain valuable: taste, judgment, and oversight
人类技能的持久价值：品味、判断力和监督能力
_[19:29]_

**Q:** What human skills become more valuable as agents do more work?
**问：** 随着 AI agent 承担更多工作，哪些人类技能变得更加重要？

**A:** Agents currently function as "intern entities" that handle implementation details but still require human oversight for "aesthetics, judgment, taste" and high-level design decisions. The speaker illustrates this with a concrete bug where his agent incorrectly used email addresses instead of persistent user IDs to match Stripe payments with Google accounts, demonstrating that humans must still "be in charge of this spec, this plan" and ensure fundamental architectural soundness. While agents now handle API-level details that developers "don't have to" remember anymore—like the differences between "keep dims versus keep dim" across PyTorch, NumPy, and pandas—humans remain responsible for understanding core concepts like tensor views versus storage to avoid "copying memory around unnecessarily." The division of labor places humans in charge of "the taste, the engineering, the design" while agents function as engineers who "fill in the blanks," with humans ensuring requests make sense at a systems level.
**答：** 当前的 AI agent 更像是"实习生"，能处理实现细节，但在"美学、判断、品味"和高层设计决策上仍需人类监督。讲者用一个具体的 bug 说明了这点：他的 agent 错误地用邮箱地址而非持久化用户 ID 来关联 Stripe 支付和 Google 账户，这表明人类必须"负责规格说明和计划"，确保基础架构的合理性。虽然 agent 现在能处理开发者"不必再记住"的 API 细节——比如 PyTorch、NumPy、pandas 中"keep dims 还是 keep dim"的差异——但人类仍需理解核心概念，如 tensor 的 view 和 storage 区别，以避免"不必要的内存拷贝"。这种分工让人类负责"品味、工程设计"，而 agent 作为工程师"填补空白"，人类确保需求在系统层面合理。

### Topic 14: Will taste and judgment matter less over time?
品味和判断力会随时间变得不那么重要吗？
_[22:13]_

**Q:** Will the need for human taste and aesthetic judgment diminish as models improve?
**问：** 随着模型改进，对人类品味和审美判断的需求会减少吗？

**A:** Speaker A believes aesthetic judgment remains poor in current models primarily because "there's probably no aesthetics cost or reward" in the reinforcement learning process, not due to fundamental limitations. He illustrates this with concrete examples: generated code often has "a lot of copy-paste and awkward abstractions that are brittle," and his MicroGPT simplification project revealed models "can't" simplify code effectively, making the process feel like "pulling teeth" rather than natural. While he maintains that "people still remain in charge of this" for now, he's optimistic that improvement is possible since "there's nothing fundamental preventing it, it's just the labs haven't done it yet."
**答：** Speaker A 认为当前模型的审美判断能力较弱，主要是因为强化学习过程中「可能没有审美成本或奖励」，而非根本性限制。他用具体例子说明：生成的代码常有「大量复制粘贴和脆弱的抽象」，而他的 MicroGPT 简化项目显示模型「做不到」有效简化代码，整个过程像「拔牙」而非自然流畅。虽然他认为目前「人类仍然掌控这方面」，但他对改进持乐观态度，因为「没有根本性的东西阻止它，只是实验室还没做到而已」。

### Topic 15: Jagged intelligence: animals vs ghosts framework
参差不齐的智能：动物与幽灵框架
_[23:30]_

**Q:** Why does the 'animals versus ghosts' framing matter for understanding AI systems?
**问：** 为什么「动物与幽灵」的框架对理解 AI 系统很重要？

**A:** The animals versus ghosts framework distinguishes AI systems as entities "shaped by data and reward functions" rather than by evolutionary drives like "intrinsic motivation or fun or curiosity or empowerment." This framing matters because having "a good model of what they are or are not" fundamentally changes how we build, deploy, evaluate, and trust these systems. The core insight is that we're dealing with jagged, non-biological forms of intelligence that lack the coherent motivational structure that evolved in living organisms, making them fundamentally different from animal-like agents.
**答：** 动物与幽灵框架的核心区别在于，AI 系统是由「数据和奖励函数塑造」的，而不是像生物那样由「内在动机、乐趣、好奇心或自我赋能」等进化驱动力形成的。这个框架很重要，因为对 AI「是什么或不是什么有一个好的模型」会从根本上改变我们构建、部署、评估和信任这些系统的方式。核心洞察是，我们面对的是参差不齐的、非生物性的智能形式，它们缺乏生物进化出的连贯动机结构，这让它们与类动物智能体有本质区别。

### Topic 16: Understanding AI as statistical systems, not animal intelligence
将AI理解为统计系统而非动物智能
_[24:18]_

**Q:** How should we think about AI systems and what mindset helps when working with them?
**问：** 我们应该如何理解AI系统，什么样的思维方式有助于使用它们？

**A:** The speaker argues that AI systems should be understood as "statistical simulation circuits" rather than animal-like intelligences, meaning emotional interactions like yelling at them have no effect on performance. The architecture consists of "pre-training" as the statistical substrate with "RL bolted on top" that increases dependencies, which shapes what approaches are likely to work. Rather than offering prescriptive rules, the speaker emphasizes adopting a mindset of being "suspicious of it" and iteratively figuring out what works through experimentation over time.
**答：** 说话者认为应该把AI系统理解为"statistical simulation circuits"而不是类似动物的智能，这意味着对它们大喊大叫这类情绪化互动不会影响性能。系统架构是以pre-training作为统计基础，上面"bolted on"了RL层，这增加了依赖关系，也决定了哪些方法可能有效。说话者没有给出具体的使用规则，而是强调要保持"suspicious"的心态，通过不断试验来摸索什么方法有效。

### Topic 17: Living in an agent-native world with real permissions
Agent原生世界：从人类优先到Agent优先的基础设施
_[25:16]_

**Q:** What does the future look like when agents have real permissions and take action on our behalf?
**问：** 当Agent拥有真实权限并代表我们行动时，未来会是什么样子？

**A:** The speaker envisions a fundamental shift where infrastructure becomes "agent native" rather than human-centric, with documentation and interfaces designed for LLMs first. He illustrates this frustration through his MenuGen deployment experience, where the actual coding was trivial but "deploying it in Vercel" required tedious manual configuration across multiple services—ideally, he argues, you should "give a prompt to an LLM, build MenuGen, and then I didn't have to touch anything." The future involves decomposing workloads into "sensors over the world, actuators over the world" with data structures optimized for LLM legibility. He predicts agent-to-agent communication will become standard, where "I'll have my agent talk to your agent to figure out some of the details of our meetings," representing both individuals and organizations.
**答：** Speaker认为未来基础设施需要从根本上转向"agent native"设计，优先考虑LLM而非人类。他用MenuGen部署经历说明这个痛点：写代码很简单，但在Vercel上部署却需要在各种服务间手动配置DNS等设置，理想情况应该是"给LLM一个prompt，构建MenuGen，然后我什么都不用管"。未来的工作流会被分解为"传感器和执行器"，数据结构专门为LLM可读性优化。他预测agent间通信会成为常态，"我的agent和你的agent交流"来安排会议细节，代表个人和组织进行互动。

### Topic 18: Sensors and actuators as a framework for agent infrastructure
用传感器和执行器框架理解agent基础设施
_[27:38]_

**Q:** How can we conceptualize agent architecture using physical system analogies?
**问：** 如何用物理系统类比来理解agent架构？

**A:** Speaker B enthusiastically endorses the "sensors and actuators" framework as a powerful mental model for understanding agent architecture, describing it as a "visual analogy" that provides fresh conceptual clarity. This framing maps agent capabilities onto familiar physical system components: sensors represent input mechanisms that perceive and gather information from the environment, while actuators represent output mechanisms that take actions and effect change. The brevity of this exchange suggests this analogy was introduced earlier in the conversation and resonated strongly enough with Speaker B to warrant explicit acknowledgment before transitioning topics.
**答：** Speaker B热情认可用"传感器和执行器"框架作为理解agent架构的强大心智模型，称其为一个提供全新概念清晰度的"visual analogy"。这个框架将agent能力映射到熟悉的物理系统组件：传感器代表感知和收集环境信息的输入机制，执行器代表采取行动和产生变化的输出机制。这段简短交流表明该类比在对话早期被提出，并给Speaker B留下深刻印象，值得在转换话题前特别强调。

### Topic 19: What remains worth learning when intelligence gets cheap
当智能变得廉价时什么仍值得深入学习
_[27:43]_

**Q:** What should we still learn deeply as AI becomes more capable?
**问：** 随着AI能力增强，我们仍应深入学习什么？

**A:** Andrej argues that while "you can outsource your thinking but you can't outsource your understanding," making human understanding the irreplaceable bottleneck in an AI-augmented world. He emphasizes that humans remain essential as directors who must know "what are we trying to build, why is it worth doing" and how to direct AI agents effectively. He uses LLM knowledge bases as tools to enhance understanding by generating different projections of information, noting that "LLMs certainly don't excel at understanding" and humans are "uniquely in charge of that." The core insight is that understanding—not thinking or processing—remains the fundamental constraint that cannot be automated away.
**答：** Andrej认为虽然"可以外包思考但无法外包理解"，这使得人类的理解力成为AI时代不可替代的瓶颈。他强调人类仍然是必不可少的指挥者，必须清楚"我们要构建什么、为什么值得做"以及如何有效指导AI agents。他将LLM知识库作为增强理解的工具，通过生成信息的不同投影来获得洞察，并指出"LLM在理解方面并不擅长"，人类"uniquely in charge of that"。核心观点是：理解力——而非思考或处理能力——仍然是无法被自动化取代的根本约束。

---

## Vocabulary (CEFR B2+)

### agentic  /eɪˈdʒentɪk/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 9

**EN:** having the ability to act autonomously and make independent decisions; characterized by agency and self-directed action  
**CN:** 具有自主行动能力的；能够独立决策的；具有主体性的

**Original examples:**
- [01:00] Well, first of all, I guess like as many of you, I've been using **agentic** tools like LLM code, adjacent things, for a while, maybe over the last year as it came out and it was very good at, you know, chunks of code and sometimes it would mess up and you have to edit them and it was kind of helpful.  
  首先，我想和你们很多人一样，我在过去一年左右一直在使用像 LLM 代码这样的**自主智能**工具，它在处理代码块方面非常出色，虽然有时会出错需要你去编辑，但总体来说还是很有帮助的。
- [01:52] But you really had to look again, and you had to look as of December, because things have changed fundamentally, and especially on this **agentic** coherent workflow that really started to actually work.  
  但你真的需要重新审视，特别是从12月开始，因为情况发生了根本性的变化，尤其是这种**自主智能**的连贯工作流真正开始发挥作用了。
- [15:49] We're in a world that feels a little bit more serious, more **agentic** engineering.  
  我们现在处于一个感觉更加严肃的世界，更多的是**自主智能**工程。
- [16:06] But then I would say **agentic** engineering is about preserving the quality bar of what existed before in professional software.  
  但我想说的是，**自主智能**工程是关于保持专业软件之前存在的质量标准。
- [16:24] And so to me **agentic** engineering, when I call it that, because I do think it's kind of like an engineering discipline.  
  所以对我来说，**自主智能**工程，我之所以这样称呼它，是因为我确实认为它是一种工程学科。
- [16:33] A bit stochastic, but they are extremely powerful. How do you coordinate them to go faster without sacrificing your quality bar and doing that well and correctly is the realm of **agentic** engineering. So I kind of see them as different, like one is about maybe raising the floor and the other is about extrapolating. And what I'm seeing, I think, is there is a very high ceiling on agentic engineer capability. And you know, people used to talk about the 10x engineer previously. I think that this is magnified a lot more. 10x is not the speed up you gain. And I think it does seem to me like people who are very good at this peak a lot more than 10x from my perspective right now.  
  有点随机性,但它们极其强大。如何协调它们来加快速度而不牺牲质量标准,并且做得好、做得正确,这就是 agentic engineering 的领域。所以我认为它们是不同的,一个是关于提升下限,另一个是关于向上延伸。而我看到的是,agentic engineer 的能力上限非常高。你知道,人们以前常说 10 倍工程师。我认为这个倍数被大大放大了。你获得的加速不止 10 倍。从我现在的角度来看,那些非常擅长这个的人的峰值能力远超 10 倍。
- [18:23] A related thought is a lot of people are maybe hiring for this right, because they want to hire strong **agentic** engineers.  
  一个相关的想法是，很多人可能正在为此招聘，因为他们想雇用强大的**自主智能**工程师。
- [18:31] I do think that what I'm seeing is that most people have still not refactored their hiring process for **agentic** engineer capability, right?  
  我确实认为我所看到的是，大多数人仍然没有为**自主智能**工程师能力重构他们的招聘流程，对吧？
- [25:31] Yeah, I think a lot of people probably here are excited about what this agent native **agentic** environment looks like and everything has to be rewritten. Everything is still fundamentally written for humans and has to be moved around. I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans. This is my favorite pet peeve. Like, why are people still telling me what to do?  
  我想这里很多人都对 agent 原生环境感到兴奋,认为一切都需要重写。现在一切仍然是为人类设计的,需要人来操作。我使用各种框架或库时,它们的文档仍然是为人类写的。这是我最喜欢吐槽的点:为什么人们还在告诉我该做什么?

**Extra example:**
- The new AI system demonstrates **agentic** behavior by planning and executing tasks without human intervention.  
  这个新的 AI 系统展示了**自主智能**行为，能够在没有人工干预的情况下规划和执行任务。

### coherent  /koʊˈhɪrənt/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** logical, consistent, and forming a unified whole  
**CN:** 连贯的，一致的，形成统一整体的

**Original examples:**
- [01:52] But you really had to look again, and you had to look as of December, because things have changed fundamentally, and especially on this agentic **coherent** workflow that really started to actually work.  
  但你真的需要重新审视，特别是从去年12月开始，因为情况发生了根本性变化，尤其是这种 agentic **连贯**工作流真正开始发挥作用了。

**Extra example:**
- The team developed a **coherent** strategy for product development.  
  团队制定了一个**连贯**的产品开发策略。

### paradigm  /ˈpærədaɪm/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 6

**EN:** a typical example or pattern of something; a model or framework of concepts and practices that defines a discipline  
**CN:** 范式；典范；模式；（某学科的）概念框架

**Original examples:**
- [02:35] New computing **paradigm**. And software 1.0 was explicit rules, software 2.0 was learned weights, software 3.0 is this.  
  新的计算**范式**。软件1.0是显式规则，软件2.0是学习权重，软件3.0就是这个。
- [04:20] And the reason this is a lot more powerful is you're working now in the Software 3.0 **paradigm** where you don't have to precisely spell out all the individual details of that setup.  
  这种方式强大得多的原因是,你现在是在 Software 3.0 范式下工作,不需要精确地写出设置的每个细节。
- [04:47] That's the programming **paradigm**.  
  这就是编程**范式**。
- [06:04] It's working in the old **paradigm** that app shouldn't exist, and yeah, the software 3.0 paradigm is a lot more kind of raw.  
  它还在用旧范式工作,那个应用根本不应该存在,而 Software 3.0 范式要更原始直接得多。
- [06:21] So I think that people have to kind of like reframe—you know, not to work in existing **paradigm** of what things existed—and just think about it as a speed up of what exists.  
  所以我觉得人们需要重新思考——不要用现有事物的既有范式去思考——不要只把它当作现有事物的加速。
- [18:31] I do think that what I'm seeing is that most people have still not refactored their hiring process for agentic engineer capability, right? Like if you're giving out puzzles to solve, this is still the old **paradigm**.  
  我确实认为,我看到的是,大多数人仍然没有针对 agentic engineer 能力重构他们的招聘流程,对吧?如果你还在出谜题让人解决,这仍然是旧范式。

**Extra example:**
- The scientific community underwent a **paradigm** shift when quantum mechanics challenged classical physics.  
  当量子力学挑战经典物理学时，科学界经历了一次**范式**转变。

### lever  /ˈlevər/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 2

**EN:** a means of exerting control or influence; a tool or mechanism for achieving something  
**CN:** 杠杆；控制手段；影响力工具

**Original examples:**
- [03:25] Context window is your **lever** over the interpreter that is the LLM that is kind of like interpreting your context and performing computation in the digital information space.  
  上下文窗口是你对 LLM 这个解释器的**控制杠杆**，它会解释你的上下文并在数字信息空间中执行计算。
- [14:41] Uh, you can use your favorite fine-tuning framework and, um, and, uh, pull the **lever** and get something that actually, uh, works pretty well.  
  呃，你可以使用你最喜欢的微调框架，然后，嗯，拉动**杠杆**，得到一些实际上运行得相当好的东西。

**Extra example:**
- Education is often seen as the most powerful **lever** for social mobility.  
  教育通常被视为实现社会流动性的最强大**杠杆**。

### spurious  /ˈspjʊriəs/
**CEFR:** C2 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** false or fake; not genuine, authentic, or valid; based on false reasoning  
**CN:** 虚假的；伪造的；不真实的；基于错误推理的

**Original examples:**
- [05:51] Basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels—it rendered the different things in the menu—and this blew my mind because actually all of my menu gen is **spurious**.  
  NanoBanana 基本上返回的图像就是我拍的那张菜单照片,但它实际上在像素层面渲染了菜单里不同菜品的样子——这让我震惊,因为我的整个 menu gen 其实是多余的。

**Extra example:**
- The study was criticized for drawing conclusions based on **spurious** correlations in the data.  
  这项研究因基于数据中的**虚假**相关性得出结论而受到批评。

### verifiability  /ˌverɪfaɪəˈbɪləti/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 6

**EN:** the quality of being able to be checked, confirmed, or proven to be true or accurate  
**CN:** 可验证性；可证实性；能够被检验或证明为真实准确的特性

**Original examples:**
- [09:41] I'd like to talk a little bit about this concept of **verifiability**, the fact that AI will automate faster and more easily domains where the output can be verified.  
  我想谈谈这个**可验证性**的概念，即 AI 会更快、更容易地自动化那些输出可以被验证的领域。
- [10:02] Yes. So I spent some time writing about **verifiability** and basically traditional computers can easily automate what you can specify in code, and this latest round of LLMs can easily automate what you can verify in a certain sense, because the way this works is that when frontier labs are training these LLMs, these are giant reinforcement learning environments.  
  是的。所以我花了一些时间写关于**可验证性**的内容，基本上传统计算机可以轻松自动化你可以在代码中指定的内容，而最新一轮的 LLM 可以在某种意义上轻松自动化你可以验证的内容，因为这种工作方式是，当前沿实验室训练这些 LLM 时，这些都是巨大的强化学习环境。
- [10:44] So I think the reason I wrote about **verifiability** is I'm trying to understand why these things are so...  
  所以我认为我写关于**可验证性**的原因是我试图理解为什么这些东西如此...
- [10:44] So I think the reason I wrote about **verifiability** is I'm trying to understand why these things are so  
  所以我写关于可验证性的原因是，我想理解为什么这些东西如此
- [14:10] Previous question of, I do think that **verifiability**, because it, um, let me think.  
  之前的问题，我确实认为**可验证性**，因为它，嗯，让我想想。
- [14:14] So **verifiability** makes something tractable in the current paradigm because you can throw a huge amount of RL at it.  
  可验证性使得某件事在当前范式下变得可行,因为你可以对它投入大量强化学习。

**Extra example:**
- The **verifiability** of scientific claims is essential to the peer review process.  
  科学主张的**可验证性**对同行评审过程至关重要。

### jagged  /ˈdʒæɡɪd/
**CEFR:** B2 | **Part of speech:** adj. | **Occurrences:** 4

**EN:** having rough, sharp, or uneven edges or surfaces; irregular in pattern or distribution  
**CN:** 参差不齐的；锯齿状的；不规则的；（能力或表现）不均衡的

**Original examples:**
- [10:49] **Jagged**, and some of it has to do with how the labs train the models, but I think some of it also has to do with the focus of the labs and what they happen to put into the data distribution.  
  参差不齐，部分原因与实验室如何训练模型有关，但我认为部分原因也与实验室的关注点以及它们恰好放入数据分布中的内容有关。
- [11:52] This is insane. And to whatever extent these models remain **jagged**, it's an indication that number one, maybe something's slightly off, or number two, you need to actually be in the loop a little bit and you need to treat them as tools and you do have to kind of stay in touch with what they're doing.  
  这太离谱了。这些模型在多大程度上仍然表现得参差不齐,就说明:第一,可能有些地方不太对劲;第二,你确实需要参与进来一点,需要把它们当作工具来对待,而且你必须对它们正在做的事情保持关注。
- [12:11] And so I think all of my writing, long story short, about verifiability is just trying to understand why these things are **jagged**. Is there any pattern to it?  
  所以我认为我所有关于可验证性的写作，长话短说，只是试图理解为什么这些东西是**参差不齐的**。这其中有什么规律吗？
- [13:36] I'd love to come back to the concept of **jagged** intelligence in a little bit. If you are a founder today and thinking about building a company, you are trying to solve a problem that you think is tractable, something that is a domain that is verifiable, but you look around and you think, "Oh my gosh, well, the labs have really started getting to escape velocity in the ones that seem most obvious, math, coding, and others." What would your advice be to the founders in the audience?  
  我很想稍后再回到参差不齐的智能这个概念。如果你今天是一位创始人,正在考虑创建一家公司,你试图解决一个你认为可行的问题,一个可验证的领域,但你环顾四周会想:「天哪,实验室在那些看起来最明显的领域——数学、编程等等——真的开始达到逃逸速度了。」你会给在座的创始人什么建议?

**Extra example:**
- AI capabilities remain **jagged**, excelling at some tasks while struggling with seemingly simpler ones.  
  AI 的能力仍然是**参差不齐的**，在某些任务上表现出色，而在看似更简单的任务上却很吃力。

### circuit  /ˈsɜːrkɪt/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 1

**EN:** a complete path or system through which something flows or operates; in neural networks, a pattern of connections  
**CN:** 回路，电路；（神经网络中的）连接模式

**Original examples:**
- [13:13] And if you're in the **circuits** that were part of the RL, you fly.  
  如果你处在那些经过强化学习训练的**回路**中，你就能表现出色。

**Extra example:**
- The neural **circuit** responsible for language processing showed high activation.  
  负责语言处理的神经**回路**显示出高度激活。

### verifiable  /ˈverɪfaɪəbl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** able to be checked, confirmed, or proven to be true or accurate  
**CN:** 可验证的；可证实的；能够被检验或证明的

**Original examples:**
- [15:14] I do think that ultimately almost everything can be made **verifiable** to some extent, some things easier than others. Because even for things like writing or so on, you can imagine having a council of LLM judges and probably get something reasonable out of this kind of an approach.  
  我确实认为最终几乎所有东西都可以在某种程度上变得**可验证**，有些事情比其他事情更容易。因为即使是像写作之类的事情，你也可以想象有一个 LLM 评委会，并且可能从这种方法中得到一些合理的结果。

**Extra example:**
- The company's financial claims must be **verifiable** through independent audits.  
  公司的财务声明必须通过独立审计来**验证**。

### automatable  /ɔːˈtɒmətəbl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** capable of being automated; suitable for being performed by machines or computer systems without human intervention  
**CN:** 可自动化的；能够被机器或计算机系统自动执行的

**Original examples:**
- [15:43] Everything is **automatable**.  
  一切都是**可自动化的**。

**Extra example:**
- Repetitive data entry tasks are highly **automatable** and should be prioritized for AI implementation.  
  重复性的数据录入任务是高度**可自动化的**，应该优先考虑用 AI 来实现。

### vulnerability  /ˌvʌlnərəˈbɪləti/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 1

**EN:** a weakness or flaw that can be exploited; the state of being exposed to harm or attack  
**CN:** 漏洞，弱点；易受攻击的状态

**Original examples:**
- [16:11] So you're not allowed to introduce **vulnerabilities** due to vibe coding.  
  所以你不能因为凭感觉编程而引入漏洞。

**Extra example:**
- Security experts discovered a critical **vulnerability** in the system.  
  安全专家在系统中发现了一个严重的漏洞。

### fallible  /ˈfæləbl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** capable of making mistakes or being wrong  
**CN:** 易犯错的，会出错的

**Original examples:**
- [16:29] You have these agents which are these like spiky entities. They're a bit **fallible**, a little  
  你有这些agent，它们就像是有棱角的实体。它们有点容易出错，有点

**Extra example:**
- Even the most experienced professionals are **fallible** under pressure.  
  即使是最有经验的专业人士在压力下也会犯错。

### spiky  /ˈspaɪki/
**CEFR:** B2 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** having sharp, uneven variations; inconsistent in performance or capability  
**CN:** 尖锐不平的；（性能或能力）不均衡的，参差不齐的

**Original examples:**
- [16:33] AI agents have a **spiky** capability profile—excellent in some areas but weak in others.  
  AI agents 的能力分布很**不均衡**——在某些领域表现出色，但在其他领域较弱。

**Extra example:**
- The model's performance was **spiky**, with dramatic differences across tasks.  
  这个模型的性能很**不均衡**，在不同任务上表现差异巨大。

### aesthetics  /esˈθetɪks/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 2

**EN:** principles concerned with beauty, art, and taste; the visual appearance or style of something  
**CN:** 美学；审美观；外观风格

**Original examples:**
- [19:34] So yeah, it's a good question. I think, well, right now the answer is that the agents are kind of like these intern entities, right? So it's remarkable. You basically still have to be in charge of the **aesthetics**, the judgment, the taste, and a little bit of oversight. Maybe one of my favorite examples of like the weirdness of agents is, for menu gen, you sign up with a Google account, but you purchase credits using a Stripe account, and both of them have email addresses. And my agent actually tried to basically, like when you purchase credits, it assigned it using the email address from Stripe to the Google email address, like...  
  这是个好问题。我觉得,目前的答案是这些 agent 有点像实习生一样的存在,对吧?很神奇的是,你基本上还是要负责审美、判断、品味,以及一些监督工作。我最喜欢的一个例子,能体现 agent 的怪异之处,就是在 menu gen 中,你用 Google 账号注册,但用 Stripe 账号购买积分,两者都有邮箱地址。我的 agent 实际上尝试在你购买积分时,用 Stripe 的邮箱地址去匹配 Google 的邮箱地址,就像……
- [22:33] **Aesthetics** cost or reward, or it's not good enough or something like that.  
  美学上的代价或回报，或者说不够好之类的。

**Extra example:**
- The designer focused on both functionality and **aesthetics** in the new interface.  
  设计师在新界面中同时关注了功能性和美学。

### oversight  /ˈoʊvərsaɪt/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 2

**EN:** supervision or watchful care; the act of overseeing or monitoring  
**CN:** 监督，监管；疏忽

**Original examples:**
- [19:34] So yeah, it's a good question. I think, well, right now the answer is that the agents are kind of like these intern entities, right? So it's remarkable. You basically still have to be in charge of the aesthetics, the judgment, the taste, and a little bit of **oversight**. Maybe one of my favorite examples of like the weirdness of agents is, for menu gen, you sign up with a Google account, but you purchase credits using a Stripe account, and both of them have email addresses. And my agent actually tried to basically, like when you purchase credits, it assigned it using the email address from Stripe to the Google email address, like...  
  这是个好问题。我觉得,目前的答案是这些 agent 有点像实习生一样的存在,对吧?很神奇的是,你基本上还是要负责审美、判断、品味,以及一些监督工作。我最喜欢的一个例子,能体现 agent 的怪异之处,就是在 menu gen 中,你用 Google 账号注册,但用 Stripe 账号购买积分,两者都有邮箱地址。我的 agent 实际上尝试在你购买积分时,用 Stripe 的邮箱地址去匹配 Google 的邮箱地址,就像……
- [20:40] And I actually don't even like the plan mode. I would—I mean obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed and maybe it's basically the docs, and then get the agents to write them and you're in charge of the **oversight** and the top level categories, but the agents are—  
  其实我甚至不太喜欢计划模式。我的意思是,它显然很有用,但我觉得这里有更通用的东西,就是你必须和你的 agent 一起设计一个非常详细的规格说明,可能基本上就是文档,然后让 agent 去编写它们,你负责监督和顶层分类,但 agent 在做——

**Extra example:**
- The project requires careful **oversight** to ensure quality standards are met.  
  这个项目需要仔细监督以确保达到质量标准。

### bloated  /ˈbloʊtɪd/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** excessively large, complex, or full of unnecessary elements  
**CN:** 臃肿的，过度复杂的，充斥不必要元素的

**Original examples:**
- [22:52] The code agents produce can be **bloated** and inefficient.  
  AI agents 生成的代码可能会很臃肿且低效。

**Extra example:**
- The software became **bloated** with features that users rarely needed.  
  这个软件变得臃肿不堪，充斥着用户很少需要的功能。

### brittle  /ˈbrɪtl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** fragile and easily broken; lacking flexibility or resilience  
**CN:** 脆弱的，易碎的；缺乏灵活性或韧性的

**Original examples:**
- [22:52] The abstractions can be **brittle** and break under edge cases.  
  这些抽象可能很脆弱，在边缘情况下容易崩溃。

**Extra example:**
- The system's **brittle** architecture made it difficult to maintain.  
  系统脆弱的架构使其难以维护。

### intrinsic  /ɪnˈtrɪnsɪk/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** belonging naturally; essential; inherent to the nature of something  
**CN:** 内在的，固有的，本质的

**Original examples:**
- [23:46] And these are jagged forms of intelligence that are shaped by data and reward functions, but not by **intrinsic** motivation or fun or curiosity or empowerment.  
  这些是参差不齐的智能形式，由数据和奖励函数塑造，而不是由内在动机、乐趣、好奇心或赋能塑造。

**Extra example:**
- The **intrinsic** value of education goes beyond just getting a job.  
  教育的内在价值远不止找到一份工作。

### bottleneck  /ˈbɑːtlnek/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 2

**EN:** a point of congestion or blockage that slows down progress or flow  
**CN:** 瓶颈，阻碍

**Original examples:**
- [28:25] Somehow information still has to make it into my brain, and I feel like I'm becoming a **bottleneck** of just even knowing what are we trying to build, why is it worth doing, how do I direct, you know, how do I direct my agents and so on.  
  不知怎么的，信息仍然必须进入我的大脑，我觉得我正在成为一个瓶颈，甚至只是知道我们要构建什么，为什么值得做，我如何指导，你知道，我如何指导我的agent等等。
- [29:12] Ultimately these are tools to enhance understanding in a certain way, and this is still kind of like a bit of a **bottleneck** because then you can't direct the—you can't be a good director if you still—because the LLMs certainly don't excel at understanding, you still are uniquely in charge of that.  
  最终这些都是在某种程度上增强理解的工具,而这仍然是一个瓶颈,因为如果你不能很好地理解,你就无法成为一个好的指导者——因为 LLM 在理解方面显然并不擅长,你仍然是唯一负责这件事的人。

**Extra example:**
- The slow server became a major **bottleneck** in the production process.  
  缓慢的服务器成为生产过程中的主要瓶颈。

### projection  /prəˈdʒekʃn/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 1

**EN:** a representation or view of something from a particular perspective; a way of presenting information  
**CN:** 投影，映射；（信息的）呈现方式或视角

**Original examples:**
- [28:56] Different **projections** of the same information can enhance understanding and reveal new insights.  
  对同一信息的不同**投影**可以增强理解并揭示新的洞见。

**Extra example:**
- The data **projection** made it easier to identify patterns in the dataset.  
  数据**投影**使得识别数据集中的模式变得更容易。

### stark  /stɑːrk/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** severe or bare in appearance; sharply contrasting; harsh or unpleasant  
**CN:** 鲜明的，明显的；严酷的

**Original examples:**
- [01:22] I think many other people were similar and I just started to notice that with the latest models the chunks just came out fine and then I kept asking for more and it just came out fine and then I can't remember the last time I corrected it and then I just, you know, trusted the system more and more and then I was vibe coding. [laughter] And so it was kind of a—I do think that it was a very **stark** transition.  
  我想很多人也有类似经历,我开始注意到用最新的模型时,代码片段就这么完美地生成出来了,然后我不断要求更多,它还是完美生成,我已经记不清上次纠正它是什么时候了,然后我就越来越信任这个系统,然后我就在 vibe coding 了。(笑声)所以这确实是一个非常明显的转变。

**Extra example:**
- There's a **stark** difference between the two approaches to problem-solving.  
  这两种解决问题的方法之间存在鲜明的差异。

### repercussion  /ˌriːpərˈkʌʃn/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 1

**EN:** an unintended consequence or indirect effect of an event or action  
**CN:** 反响，影响；（尤指）不良后果

**Original examples:**
- [02:21] So yeah, that kind of happened in December, I would say, and I was looking at the **repercussions** of that since.  
  所以这种情况发生在 12 月,我从那以后一直在观察它带来的影响。

**Extra example:**
- The policy change had serious **repercussions** for small businesses.  
  这项政策变化对小企业产生了严重的影响。

### extrapolate  /ɪkˈstræpəleɪt/
**CEFR:** C1 | **Part of speech:** v. | **Occurrences:** 3

**EN:** to estimate or conclude something by extending known information or trends into an unknown area  
**CN:** 推断，外推（根据已知信息推测未知情况）

**Original examples:**
- [07:48] If you **extrapolate** that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era?  
  如果你进一步**推断**，那么2026年相当于90年代建网站、2010年代开发移动应用、上一个云时代构建SaaS的是什么？
- [07:48] If you **extrapolate** that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era? What will look completely obvious in hindsight that is still mostly unbuilt today?  
  如果把这个趋势继续推演下去，什么会成为2026年的等价物——就像90年代建网站、2010年代做移动应用、上一个云时代构建SaaS那样？什么东西事后看来会显得理所当然，但现在还基本没被开发出来？
- [17:03] So I kind of see them as different, like one is about maybe raising the floor and the other is about **extrapolating**.  
  所以我认为它们是不同的，一个可能是关于提高底线，另一个是关于**推断**。

**Extra example:**
- Scientists **extrapolate** future climate patterns from current data trends.  
  科学家们根据当前数据趋势**推断**未来的气候模式。

### deterministic  /dɪˌtɜːrmɪˈnɪstɪk/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** relating to a system where outcomes are precisely determined by initial conditions, with no randomness involved  
**CN:** 确定性的（结果完全由初始条件决定，不涉及随机性）

**Original examples:**
- [09:24] But what's really running the show is these neural nets that are in a certain way not **deterministic**.  
  但真正主导的是这些在某种程度上不是**确定性的**神经网络。

**Extra example:**
- Traditional algorithms are **deterministic**, always producing the same output for the same input.  
  传统算法是**确定性的**，对于相同的输入总是产生相同的输出。

### frontier  /frʌnˈtɪr/
**CEFR:** B2 | **Part of speech:** n. | **Occurrences:** 1

**EN:** the extreme limit of understanding or achievement in a particular area; the leading edge of development  
**CN:** 前沿，尖端（某领域理解或成就的极限；发展的最前端）

**Original examples:**
- [10:24] So the **frontier** labs are given verification rewards and then because of the way that these models are trained, they end up progressing and creating these jagged entities.  
  所以这些**前沿** lab获得了验证奖励，然后由于这些模型的训练方式，它们最终不断进步并创造出这些参差不齐的实体。

**Extra example:**
- The company is pushing the **frontier** of quantum computing research.  
  这家公司正在推动量子计算研究的**前沿**。

### stagnate  /ˈstæɡneɪt/
**CEFR:** C1 | **Part of speech:** v. | **Occurrences:** 1

**EN:** to stop developing, progressing, or advancing; to become inactive or dull  
**CN:** 停滞，不再发展（停止进步或发展；变得不活跃）

**Original examples:**
- [10:44] They really peak in capability in verifiable domains like math and code and adjacent areas, and kind of **stagnate** and are a little bit rough around the edges when things are not in that space.  
  它们在数学和代码等可验证领域以及相邻领域的能力确实达到了顶峰，但在不属于这些领域的事情上有点**停滞不前**，而且有些粗糙。

**Extra example:**
- Without innovation, the industry will **stagnate** and lose competitiveness.  
  没有创新，这个行业就会**停滞不前**并失去竞争力。

### refactor  /ˌriːˈfæktər/
**CEFR:** B2 | **Part of speech:** v. | **Occurrences:** 1

**EN:** to restructure existing code without changing its external behavior, typically to improve readability, maintainability, or performance  
**CN:** 重构（在不改变外部行为的情况下重组现有代码，通常是为了提高可读性、可维护性或性能）

**Original examples:**
- [11:52] They can **refactor** complex codebases but struggle with simple reasoning tasks—this is insane.  
  它们可以**重构**复杂的代码库，但在简单的推理任务上却很吃力——这太疯狂了。

**Extra example:**
- The team spent a week to **refactor** the legacy code to make it more maintainable.  
  团队花了一周时间**重构**遗留代码，使其更易于维护。

### tractable  /ˈtræktəbl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 2

**EN:** easy to deal with, manage, or solve; manageable  
**CN:** 易处理的，易解决的（容易应对、管理或解决的）

**Original examples:**
- [14:08] So I think maybe that comes to the question of which problems are **tractable** for AI systems.  
  所以我认为这可能涉及到哪些问题对AI系统来说是**易处理的**这个问题。
- [14:20] So maybe one way to see it is that making problems **tractable** remains true even if the labs are not focusing on it directly.  
  所以也许一种看法是，让问题变得**易处理**这一点仍然成立，即使lab没有直接关注它。

**Extra example:**
- Breaking down the complex project into smaller tasks made it more **tractable**.  
  将复杂项目分解成更小的任务使其更**易处理**。

### stochastic  /stəˈkæstɪk/
**CEFR:** C2 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** involving or containing a random variable; having a random probability distribution that may be analyzed statistically but not predicted precisely  
**CN:** 随机的，概率性的（涉及随机变量；具有可以统计分析但无法精确预测的随机概率分布）

**Original examples:**
- [16:33] A bit **stochastic**, but they are extremely powerful.  
  有点**随机性**，但它们非常强大。

**Extra example:**
- The **stochastic** nature of neural networks makes their outputs somewhat unpredictable.  
  神经网络的**随机性**使得它们的输出在某种程度上不可预测。

### substrate  /ˈsʌbstreɪt/
**CEFR:** C2 | **Part of speech:** n. | **Occurrences:** 1

**EN:** an underlying substance or layer; the foundation or base on which something is built or operates  
**CN:** 基底，基质（底层物质或层次；某物构建或运作的基础）

**Original examples:**
- [24:38] It's all just kind of like these statistical simulation circuits where the **substrate** is pre-training, so like statistics, and then but then there's RL bolted on top.  
  这一切就像是这些统计模拟电路，其中**基底**是预训练，也就是统计学，然后在上面加上了RL。

**Extra example:**
- Silicon serves as the physical **substrate** for most modern computer chips.  
  硅是大多数现代计算机芯片的物理**基底**。

### actuator  /ˈæktʃueɪtər/
**CEFR:** C1 | **Part of speech:** n. | **Occurrences:** 1

**EN:** a device that causes a machine or system to operate or move  
**CN:** 执行器，驱动装置(使机器或系统运行或移动的设备)

**Original examples:**
- [26:07] So everyone is I think excited about how do we decompose the workloads that need to happen into fundamentally sensors over the world, **actuators** over the world.  
  所以大家都在思考如何将需要完成的工作分解为对世界的传感器和执行器。

**Extra example:**
- The robotic arm uses hydraulic **actuators** to control precise movements.  
  这个机械臂使用液压执行器来控制精确的动作。

### legible  /ˈledʒəbl/
**CEFR:** C1 | **Part of speech:** adj. | **Occurrences:** 1

**EN:** clear enough to be read or understood easily  
**CN:** 清晰易读的，易于理解的

**Original examples:**
- [26:30] Basically describe it to agents first, and then have a lot of automation around, you know, data structures that are very **legible** to the LLMs.  
  基本上就是先向agent描述它,然后围绕那些对LLM来说非常易读的数据结构进行大量自动化。

**Extra example:**
- The handwriting was barely **legible** after years of fading.  
  经过多年褪色后,这些手写字迹几乎无法辨认。

### virtualize  /ˈvɜːrtʃuəlaɪz/
**CEFR:** C1 | **Part of speech:** v. | **Occurrences:** 1

**EN:** to create a virtual version of something, such as a computer system or network resource  
**CN:** 虚拟化(创建某物的虚拟版本,如计算机系统或网络资源)

**Original examples:**
- [08:25] Sense you feed raw videos like imagine a device that takes raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is kind of like, you know, unique for that moment in a certain sense. And I kind of feel like in the early days of computing actually people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets, and in the 50s and 60s it was not really obvious which way we'd go. And of course we went down the calculator path and ended up building classical computing, and then neural nets are currently running **virtualized** on existing computers. But you could imagine, I think that a lot of this will flip and that the neural net becomes kind of like the host process and the CPUs become kind of like the co-processor. So we saw the diagram of, you know, intelligence compute of neural networks is going to take over and become the dominant  
  也就是说，你输入原始视频，想象一个设备接收原始视频或音频，输入到一个神经网络中，然后用扩散模型渲染出一个UI界面，这个界面在某种意义上是为那个特定时刻定制的。我有点觉得，在计算机发展早期，人们其实对计算机应该长得像计算器还是像神经网络是有些困惑的，在五六十年代这个方向并不明确。当然我们最后走了计算器那条路，建立了经典计算体系，然后神经网络目前是在现有计算机上虚拟化运行的。但你可以想象，我认为很多东西会翻转过来，神经网络会成为主进程，而CPU会变成协处理器。我们看到过那个图表，神经网络的智能计算将会接管并成为主导。

**Extra example:**
- Companies **virtualize** their servers to reduce hardware costs and improve efficiency.  
  公司将服务器虚拟化以降低硬件成本并提高效率。

---

## Useful Phrases

### kick off
**Type:** phrasal_verb

**EN:** to start or begin something  
**CN:** 开始，启动

**Original examples:**
- [00:44] Hello. Excited to be here and to **kick us off**.  
  你好。很高兴来到这里，让我们开始吧。

**Extra example:**
- Let's **kick off** the meeting with a quick overview of the project.  
  让我们先快速概述一下项目来开始这次会议。

### go down a rabbit hole
**Type:** idiom

**EN:** to become deeply involved in something complex or time-consuming  
**CN:** 深入钻研某事，陷入复杂的探索中

**Literal:** 掉进兔子洞  
**Figurative EN:** to get deeply absorbed in exploring something, often losing track of time or the original purpose  
**Figurative CN:** 深陷其中，沉迷于探索某事而忘记时间或初衷

**Original examples:**
- [02:04] And so I would say that, yeah, it was just that realization that really had me **go down this whole rabbit hole** of just, you know, infinite side projects.  
  所以我想说，正是这种认识让我深陷其中，开始了无数的副项目。

**Extra example:**
- I started researching one topic and **went down a rabbit hole** for three hours.  
  我开始研究一个话题，结果深陷其中研究了三个小时。

### drive something home
**Type:** phrasal_verb

**EN:** to make something clearly understood or emphasized  
**CN:** 使某事被清楚理解，强调某事

**Original examples:**
- [03:42] So I guess yeah that's kind of the transition and I think there's a few examples of that really **drove it home** for me and maybe that might be instructive.  
  所以我想这就是这种转变，我认为有几个例子真正让我深刻理解了这一点，也许这会有启发性。

**Extra example:**
- The statistics really **drove home** the severity of the problem.  
  这些统计数据真正让人深刻理解了问题的严重性。

### balloon up
**Type:** phrasal_verb

**EN:** to increase rapidly in size or complexity  
**CN:** 迅速膨胀，急剧增加

**Original examples:**
- [04:01] These shell scripts usually **balloon up** and become extremely complex.  
  这些shell脚本通常会迅速膨胀并变得极其复杂。

**Extra example:**
- The project costs **ballooned up** from $10,000 to $50,000.  
  项目成本从1万美元迅速膨胀到5万美元。

### spell out
**Type:** phrasal_verb

**EN:** to explain something in detail or very clearly  
**CN:** 详细说明，明确阐述

**Original examples:**
- [04:29] The reason this is a lot more powerful is you're working now in the Software 3.0 paradigm where you don't have to precisely **spell out** all the individual details of that setup.  
  这之所以强大得多，是因为你现在在Software 3.0范式中工作，不必精确地详细说明设置的所有细节。

**Extra example:**
- Let me **spell out** the requirements so there's no confusion.  
  让我详细说明一下要求，这样就不会有混淆了。

### come to mind
**Type:** collocation

**EN:** to be remembered or thought of  
**CN:** 想起，浮现在脑海中

**Original examples:**
- [04:48] Now I think one more example that **comes to mind** that is even more extreme than that is when I was building Menugenen.  
  现在我想到另一个更极端的例子，就是我在构建Menugenen的时候。

**Extra example:**
- When I think of innovation, Apple **comes to mind** immediately.  
  当我想到创新时，苹果公司立即浮现在脑海中。

### at the mercy of
**Type:** idiom

**EN:** to be in a situation where someone or something has complete control over you  
**CN:** 受...支配，任由...摆布

**Literal:** 在...的仁慈之下  
**Figurative EN:** to be completely dependent on or controlled by someone or something, with no power to change the situation  
**Figurative CN:** 完全依赖或受控于某人或某事，无力改变局面

**Original examples:**
- [13:04] And so that's why I think I'm stressing this dimension of it, as we are slightly **at the mercy of** whatever the labs are doing, whatever they happen to put into the mix.  
  所以这就是为什么我强调这个维度，因为我们在某种程度上受实验室所做的事情支配，受他们碰巧放入混合物中的东西支配。

**Extra example:**
- Without a backup generator, we're **at the mercy of** the power company.  
  没有备用发电机，我们就只能任由电力公司摆布了。

### pull a lever
**Type:** idiom

**EN:** to take action or use a mechanism to achieve a result  
**CN:** 采取行动，使用手段（来达成结果）

**Literal:** 拉动杠杆  
**Figurative EN:** to use a tool, mechanism, or strategy to produce a desired outcome or effect  
**Figurative CN:** 使用工具、机制或策略来产生预期的结果或效果

**Original examples:**
- [14:38] You can **pull a lever** if you have a huge amount of diverse datasets of RL environments, etc.  
  如果你有大量多样化的强化学习环境数据集等，你就可以采取行动。
- [14:49] Uh, you can use your favorite fine-tuning framework and, um, and, uh, **pull the lever** and get something that actually, uh, works pretty well.  
  你可以使用你最喜欢的微调框架，然后采取行动，得到一些实际上运行得很好的东西。

**Extra example:**
- The government can **pull several levers** to stimulate the economy.  
  政府可以采取多种手段来刺激经济。

### to some extent
**Type:** collocation

**EN:** partly but not completely  
**CN:** 在某种程度上

**Original examples:**
- [15:14] I do think that ultimately almost everything can be made verifiable **to some extent**, some things easier than others.  
  我确实认为最终几乎所有事情都可以**在某种程度上**被验证，有些事情比其他事情更容易。

**Extra example:**
- I agree with you **to some extent**, but I think there are other factors to consider.  
  我**在某种程度上**同意你的观点，但我认为还有其他因素需要考虑。

### spoiler
**Type:** collocation

**EN:** advance revelation of plot or outcome (informal)  
**CN:** 剧透；提前透露结果

**Original examples:**
- [16:22] And **spoiler** is you can, but how do you, how do you do that properly?  
  **剧透一下**，你可以做到，但你要如何正确地做到呢？

**Extra example:**
- **Spoiler** alert: the new feature will be released next month.  
  **剧透警告**：新功能将在下个月发布。

### pull teeth
**Type:** idiom

**EN:** to do something with great difficulty  
**CN:** 非常困难；费尽周折

**Literal:** 拔牙  
**Figurative EN:** to accomplish something with extreme difficulty, requiring great effort  
**Figurative CN:** 做某事极其困难，需要付出巨大努力

**Original examples:**
- [23:15] It feels like you're obviously, you know, **pulling teeth**.  
  感觉就像你在**拔牙**一样费劲。

**Extra example:**
- Getting him to admit he was wrong is like **pulling teeth**.  
  让他承认错误就像**拔牙**一样困难。

### wrap one's head around
**Type:** idiom

**EN:** to understand something difficult or complex  
**CN:** 理解；弄明白（复杂的事情）

**Literal:** 把头缠绕在某物周围  
**Figurative EN:** to comprehend or understand something that is difficult or complex  
**Figurative CN:** 理解或领会困难或复杂的事物

**Original examples:**
- [24:08] Yeah, so I think the reason I wrote about this is because I'm trying to **wrap my head around** what these things are, right?  
  是的，我写这篇文章的原因是我试图**弄明白**这些东西到底是什么，对吧？

**Extra example:**
- I'm still trying to **wrap my head around** how quantum computing works.  
  我还在试图**弄明白**量子计算是如何工作的。

### bolt on
**Type:** phrasal_verb

**EN:** to add something as an extra feature, often hastily  
**CN:** 附加；额外添加（通常指匆忙或事后添加）

**Original examples:**
- [24:38] Like if you yell at them, they're not going to work better or worse or it doesn't have any impact. And it's all just kind of like these statistical simulation circuits where the substrate is pre-training, so like statistics, and then but then there's RL **bolted on** top.  
  比如如果你对它们大喊大叫，它们不会工作得更好或更差，也不会有任何影响。这一切都只是这些统计模拟电路，其中基础是预训练，也就是统计，然后在顶部**附加了**强化学习。

**Extra example:**
- They **bolted on** a new security feature after the data breach.  
  在数据泄露后，他们**附加了**一个新的安全功能。

### pet peeve
**Type:** collocation

**EN:** something that particularly annoys someone  
**CN:** 令人特别恼火的事；个人最讨厌的事

**Original examples:**
- [25:57] This is my favorite **pet peeve**.  
  这是我最**讨厌的事**。

**Extra example:**
- One of my biggest **pet peeves** is when people don't reply to emails.  
  我最**讨厌的事情**之一就是人们不回复邮件。

### string up
**Type:** phrasal_verb

**EN:** to connect or link things together  
**CN:** 连接；串联

**Original examples:**
- [26:48] A lot of the work, a lot of the trouble was not even writing the code for Menuguen, it was deploying it in Vercel, because I had to work with all these different services and I had to **string them up** and I had to go to their settings and the menus and you know configure my DNS and it was just so annoying.  
  很多工作，很多麻烦甚至不是为Menuguen编写代码，而是在Vercel上部署它，因为我必须使用所有这些不同的服务，我必须**把它们串联起来**，我必须进入它们的设置和菜单，配置我的DNS，这真的很烦人。

**Extra example:**
- We need to **string up** all these microservices to make the application work.  
  我们需要**串联**所有这些微服务才能让应用程序运行。

### blow one's mind
**Type:** idiom

**EN:** to greatly surprise or impress someone  
**CN:** 令人震惊；让人大吃一惊

**Literal:** 炸掉某人的大脑  
**Figurative EN:** to greatly surprise, impress, or amaze someone  
**Figurative CN:** 让某人感到非常惊讶、印象深刻或震撼

**Original examples:**
- [28:05] Yeah, there was a tweet that **blew my mind** recently and I keep thinking about it like every other day.  
  是的，最近有一条推文**让我震惊**，我每隔一天就会想起它。

**Extra example:**
- The performance of the new AI model completely **blew my mind**.  
  新AI模型的性能完全**让我震惊**。

---

## Complex Sentences

### [00:02]
**Original:** He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI.

**Translation:** 他帮助构建了现代人工智能,然后解释现代人工智能,然后偶尔重新命名现代人工智能。

**Core structure:**
- He has helped build, explain, and rename modern AI.  
  他帮助构建、解释和重新命名现代人工智能。

**Structure tree:**
```
main clause: He has helped [three parallel actions]
parallel structure 1: build modern AI
parallel structure 2: explain modern AI
parallel structure 3: occasionally rename modern AI
```

**Grammar points:**
- **并列结构的递进** - 三个动词短语通过 then 连接,表示时间顺序和递进关系
- **help + 动词原形** - help 后接不带 to 的不定式

### [03:03]
**Original:** And then what happened is that basically if you train one of these GPT models or LLMs on a sufficiently large set of tasks implicitly, because by training on the internet you have to multitask all the things that are in the dataset.

**Translation:** 然后发生的事情基本上是,如果你在足够大的任务集上隐式地训练这些 GPT 模型或大语言模型之一,因为通过在互联网上训练,你必须多任务处理数据集中的所有内容。

**Core structure:**
- What happened is that if you train these models on large tasks, you have to multitask.  
  发生的事情是,如果你在大型任务上训练这些模型,你必须多任务处理。

**Structure tree:**
```
main clause: What happened is that...
subject clause: What happened
predicative clause: if you train models... (conditional)
condition: if you train one of these models
reason clause: because by training on the internet...
```

**Grammar points:**
- **What 引导主语从句** - What happened 作主语,表示'所发生的事情'
- **条件状语从句嵌套原因状语从句** - if 从句中嵌套 because 从句,结构复杂
- **副词 implicitly 的位置** - 修饰整个 train 动作,位于句中造成理解障碍

### [04:05]
**Original:** But the thing is you're still stuck in a software 1.0 universe of wanting to write the code.

**Translation:** 但问题是你仍然困在软件 1.0 的世界里,想要编写代码。

**Core structure:**
- You're stuck in a universe of wanting to write code.  
  你困在一个想要编写代码的世界里。

**Structure tree:**
```
main clause: the thing is (that)...
predicative clause: you're stuck in a universe
modifier: of wanting to write the code (介词短语修饰 universe)
```

**Grammar points:**
- **the thing is (that)...** - 口语化表达,表示'问题/关键是...'
- **be stuck in** - 固定搭配,表示'困在...中,陷入...'

### [04:20]
**Original:** And the reason this is a lot more powerful is you're working now in the Software 3.0 paradigm where you don't have to precisely spell out all the individual details of that setup.

**Translation:** 这更强大的原因是,你现在在软件 3.0 范式中工作,在这个范式中你不必精确地详细说明该设置的所有单独细节。

**Core structure:**
- The reason this is powerful is you're working in a paradigm where you don't have to spell out details.  
  这强大的原因是你在一个不必详细说明细节的范式中工作。

**Structure tree:**
```
main clause: The reason... is (that)...
subject: The reason this is powerful
predicative clause: you're working in the paradigm
relative clause: where you don't have to spell out...
```

**Grammar points:**
- **The reason... is (that)...** - 表语从句结构,解释原因
- **where 引导定语从句** - 修饰 paradigm,表示抽象地点

### [05:51]
**Original:** And NanoBanana basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels—it rendered the different things in the menu—and this blew my mind because actually all of my menu gen is spurious.

**Translation:** NanoBanana 基本上返回了一张图像,正是我拍摄的菜单图片,但它实际上放入了像素中——它渲染了菜单中的不同东西——这让我震惊,因为实际上我的整个菜单生成器都是多余的。

**Core structure:**
- NanoBanana returned an image, but it rendered things, and this blew my mind because my menu gen is spurious.  
  NanoBanana 返回了图像,但它渲染了东西,这让我震惊,因为我的菜单生成器是多余的。

**Structure tree:**
```
compound sentence with 3 main clauses:
1. NanoBanana returned an image (with nested relative clauses)
2. but it put into pixels / rendered things (parenthetical insertion)
3. and this blew my mind (with reason clause: because...)
```

**Grammar points:**
- **嵌套定语从句** - image that is the picture that I took - 两层定语从句修饰
- **破折号插入语** - it rendered... 作为解释性插入,打断句子流畅性
- **spurious** - 高级词汇,表示'虚假的、多余的、不必要的'

### [06:53]
**Original:** But like for example with my LLM knowledge base project, basically you get LLMs to create wikis for your organization or for you in person, etc.

**Translation:** 但是,比如说我的大语言模型知识库项目,基本上你让大语言模型为你的组织或者为你个人创建维基百科等等。

**Core structure:**
- You get LLMs to create wikis for your organization.  
  你让大语言模型为你的组织创建维基。

**Structure tree:**
```
main clause: you get LLMs to create wikis
prepositional phrase: with my LLM knowledge base project
get + object + to-infinitive structure
for-phrase: for your organization or for you in person
```

**Grammar points:**
- **get + 宾语 + to do** - 使役结构,表示'让某人/某物做某事'
- **插入语** - 'like for example' 和 'basically' 作为话语标记打断句子流畅性

### [07:09]
**Original:** But now you can just take these documents and basically recompile them in a different way and reorder them and create something that is new and interesting as a reframing of the data.

**Translation:** 但现在你可以直接拿这些文档,基本上以不同的方式重新编译它们,重新排序它们,并创建一些新的有趣的东西,作为对数据的重新框架。

**Core structure:**
- You can take documents and recompile them and reorder them and create something new.  
  你可以拿文档,重新编译它们,重新排序它们,并创建新东西。

**Structure tree:**
```
main clause: you can take... and recompile... and reorder... and create...
parallel structure: 4 verbs connected by 'and'
relative clause: that is new and interesting
prepositional phrase: as a reframing of the data
```

**Grammar points:**
- **并列动词结构** - 四个动词并列(take, recompile, reorder, create),共享同一主语
- **定语从句 + 介词短语** - 'that is new and interesting' 修饰 something,'as a reframing' 说明方式

### [08:25]
**Original:** You feed raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is kind of like, you know, unique for that moment in a certain sense.

**Translation:** 你把原始视频或音频输入到基本上是一个神经网络的东西中,它使用扩散来渲染一个在某种意义上对那个时刻来说是独特的用户界面。

**Core structure:**
- You feed videos into a neural net and it uses diffusion to render a UI.  
  你把视频输入神经网络,它使用扩散来渲染用户界面。

**Structure tree:**
```
main clause: You feed videos into what's a neural net
relative clause: what's a neural net (名词性从句)
coordinate clause: and uses diffusion to render a UI
relative clause: that is unique for that moment
hedging phrases: basically, kind of like, you know, in a certain sense
```

**Grammar points:**
- **what 引导名词性从句** - 'what's a neural net' 作介词 into 的宾语
- **话语标记堆叠** - 多个模糊限定词(basically, kind of, you know, in a certain sense)使句子难以理解

### [09:06]
**Original:** But you could imagine, I think that a lot of this will flip and that the neural net becomes kind of like the host process and the CPUs become kind of like the co-processor.

**Translation:** 但你可以想象,我认为很多这样的情况会翻转,神经网络会变成类似主进程的东西,而CPU会变成类似协处理器的东西。

**Core structure:**
- I think that this will flip and the neural net becomes the host process and CPUs become the co-processor.  
  我认为这会翻转,神经网络变成主进程,CPU变成协处理器。

**Structure tree:**
```
main clause: you could imagine
inserted clause: I think
that-clause 1: that a lot of this will flip
that-clause 2: that the neural net becomes the host process
coordinate clause: and the CPUs become the co-processor
hedging: kind of like (repeated)
```

**Grammar points:**
- **插入语打断结构** - 'I think' 插在 imagine 和其宾语从句之间,增加理解难度
- **并列 that 从句** - 两个 that 从句并列作 think 的宾语

### [11:27]
**Original:** The models now patch this I think, but the new one is, I want to go to a car wash to wash my car and it's 50 meters away, should I drive or should I walk?

**Translation:** 我认为模型现在修补了这个问题,但新的例子是,我想去洗车店洗车,它离我50米远,我应该开车还是应该走路?

**Core structure:**
- The models patch this, but the new one is [a scenario question].  
  模型修补了这个,但新例子是[一个场景问题]。

**Structure tree:**
```
main clause 1: The models patch this
inserted clause: I think
coordinate clause: but the new one is...
embedded scenario: I want to go... and it's 50 meters away
embedded question: should I drive or should I walk?
```

**Grammar points:**
- **嵌套引述** - 在主句中嵌入完整的场景描述和问题,形成句中句
- **插入语位置** - 'I think' 插在主语和谓语之间,口语化特征

### [12:56]
**Original:** And so that's why I think I'm stressing this dimension of it, as we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix.

**Translation:** 所以这就是为什么我认为我在强调这个维度,因为我们在某种程度上受制于实验室正在做的任何事情,受制于他们碰巧放入其中的任何东西。

**Core structure:**
- That's why I'm stressing this, as we are at the mercy of what the labs are doing.  
  这就是为什么我在强调这一点,因为我们受制于实验室正在做的事情。

**Structure tree:**
```
main clause: that's why I'm stressing this dimension
reason clause: as we are at the mercy of...
noun clause 1: whatever the labs are doing
noun clause 2: whatever they happen to put into the mix
```

**Grammar points:**
- **whatever 引导名词性从句** - whatever 表示'无论什么',引导宾语从句作 of 的宾语
- **be at the mercy of** - 固定搭配,表示'受...支配,任...摆布'

### [13:19]
**Original:** Circuits that are out of the data distribution, you're going to struggle and you have to kind of figure out which circuits you're in in your application.

**Translation:** 对于那些超出数据分布的回路,你会遇到困难,你必须弄清楚在你的应用中你处于哪些回路中。

**Core structure:**
- You're going to struggle and you have to figure out which circuits you're in.  
  你会遇到困难,你必须弄清楚你处于哪些回路中。

**Structure tree:**
```
condition: Circuits that are out of the data distribution
main clause 1: you're going to struggle
main clause 2: you have to figure out...
noun clause: which circuits you're in in your application
```

**Grammar points:**
- **前置状语** - 名词短语前置作条件状语,后接主句
- **间接疑问句** - which circuits you're in 作 figure out 的宾语,使用陈述语序

### [14:26]
**Original:** So if you are in a verifiable setting where you could create these RL environments or examples, then that actually sets you up to potentially do your own fine-tuning and you might benefit from that.

**Translation:** 所以如果你处于一个可验证的环境中,在那里你可以创建这些强化学习环境或示例,那么这实际上会让你有可能进行自己的微调,你可能会从中受益。

**Core structure:**
- If you are in a verifiable setting, then that sets you up to do fine-tuning.  
  如果你处于可验证的环境中,那么这会让你能够进行微调。

**Structure tree:**
```
condition clause: if you are in a verifiable setting
relative clause: where you could create these RL environments
main clause 1: that sets you up to do your own fine-tuning
main clause 2: you might benefit from that
```

**Grammar points:**
- **定语从句 where** - where 引导定语从句修饰 setting,表示抽象地点
- **set sb up to do sth** - 固定搭配,表示'使某人能够做某事,为某人做某事做好准备'

### [15:14]
**Original:** I do think that ultimately almost everything can be made verifiable to some extent, some things easier than others.

**Translation:** 我确实认为最终几乎所有事情都可以在某种程度上被验证,有些事情比其他事情更容易。

**Core structure:**
- Almost everything can be made verifiable, some things easier than others.  
  几乎所有事情都可以被验证,有些比其他的更容易。

**Structure tree:**
```
main clause: everything can be made verifiable to some extent
independent clause: some things easier than others (ellipsis)
```

**Grammar points:**
- **被动语态 be made + adj** - can be made verifiable 表示'可以被使得可验证'
- **省略结构** - some things (are) easier than others,省略了 are 和 to verify

### [16:33]
**Original:** How do you coordinate them to go faster without sacrificing your quality bar and doing that well and correctly is the realm of agentic engineering.

**Translation:** 你如何协调它们以更快地前进而不牺牲你的质量标准,并且把这件事做好做对,这就是代理工程的领域。

**Core structure:**
- Doing that well and correctly is the realm of agentic engineering.  
  把这件事做好做对就是代理工程的领域。

**Structure tree:**
```
embedded question: How do you coordinate them...
purpose: to go faster
condition: without sacrificing your quality bar
main clause subject: doing that well and correctly
main clause predicate: is the realm of agentic engineering
```

**Grammar points:**
- **动名词短语作主语** - doing that well and correctly 整个动名词短语作主语
- **without + 动名词** - without sacrificing 表示否定条件'在不...的情况下'

### [18:31]
**Original:** I do think that what I'm seeing is that most people have still not refactored their hiring process for agentic engineer capability, right?

**Translation:** 我确实认为我所看到的是，大多数人仍然没有为代理工程师能力重构他们的招聘流程，对吧？

**Core structure:**
- I think that most people have not refactored their hiring process.  
  我认为大多数人没有重构他们的招聘流程。

**Structure tree:**
```
main clause: I do think that...
predicative clause: what I'm seeing is that...
nested predicative clause: most people have still not refactored...
```

**Grammar points:**
- **双层宾语从句嵌套** - think that 后接 what 从句，what 从句内又有 that 从句
- **现在完成时的否定** - have still not done 表示到现在仍未完成的动作

### [18:46]
**Original:** I would say that hiring has to look like: give me a really big project and see someone implement that big project.

**Translation:** 我会说招聘应该是这样的：给我一个非常大的项目，然后看某人实现那个大项目。

**Core structure:**
- Hiring has to look like: give me a project and see someone implement it.  
  招聘应该是：给我一个项目，看某人实现它。

**Structure tree:**
```
main clause: hiring has to look like...
imperative clause 1: give me a project
imperative clause 2: see someone implement that project
```

**Grammar points:**
- **look like 后接祈使句** - 用祈使句描述应该是什么样子
- **see + 宾语 + 动词原形** - 感官动词 see 后接宾语补足语结构

### [19:34]
**Original:** I think, well, right now the answer is that the agents are kind of like these intern entities, right?

**Translation:** 我认为，嗯，现在的答案是，这些代理有点像实习生实体，对吧？

**Core structure:**
- The answer is that the agents are like intern entities.  
  答案是代理像实习生实体。

**Structure tree:**
```
main clause: the answer is that...
predicative clause: the agents are kind of like...
insertions: I think, well, right now, right?
```

**Grammar points:**
- **多重插入语** - I think, well, right 等插入语打断句子流畅性
- **kind of like** - 口语化表达，表示某种程度的相似

### [20:40]
**Original:** And I actually don't even like the plan mode. I would—I mean obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed and maybe it's basically the docs, and then get the agents to write them and you're in charge of the oversight and the top level categories, but the agents are—

**Translation:** 而且我实际上甚至不喜欢计划模式。我会——我的意思是显然它非常有用，但我认为这里有更普遍的东西，你必须与你的代理一起设计一个非常详细的规范，也许它基本上就是文档，然后让代理编写它们，你负责监督和顶层分类，但代理是——

**Core structure:**
- I think there's something where you have to work with your agent to design a spec and get the agents to write them.  
  我认为有些东西需要你与代理一起设计规范并让代理编写它们。

**Structure tree:**
```
main clause: I think there's something...
relative clause: where you have to work...
parallel infinitives: to design... and get...
contrast: you're in charge... but the agents are...
interruptions: I would—, obviously, maybe
```

**Grammar points:**
- **句子未完成** - but the agents are— 句子被打断，典型口语特征
- **多重并列结构** - design... and get... / you're in charge... but agents are...
- **where 引导定语从句** - 修饰 something，表示抽象情况

### [21:24]
**Original:** But you still have to know, for example, that there's an underlying tensor, there's an underlying view, and then you can manipulate view of the same storage or you can have different storage which would be less efficient.

**Translation:** 但你仍然需要知道，例如，有一个底层张量，有一个底层视图，然后你可以操作同一存储的视图，或者你可以有不同的存储，这会效率较低。

**Core structure:**
- You have to know that there's a tensor and view, and you can manipulate view or have different storage.  
  你需要知道有张量和视图，你可以操作视图或有不同存储。

**Structure tree:**
```
main clause: you have to know that...
parallel that-clauses: there's a tensor, there's a view
parallel options: you can manipulate... or you can have...
relative clause: which would be less efficient
```

**Grammar points:**
- **多重并列宾语从句** - know 后接多个 that 从句和选择关系
- **虚拟语气 would be** - 表示假设情况下的结果

### [24:38]
**Original:** Like if you yell at them, they're not going to work better or worse or it doesn't have any impact. And it's all just kind of like these statistical simulation circuits where the substrate is pre-training, so like statistics, and then but then there's RL bolted on top.

**Translation:** 就像如果你对它们大喊大叫,它们不会工作得更好或更差,或者说这根本没有任何影响。而且这一切都只是像这些统计模拟电路,其中基底是预训练,也就是统计,然后在上面又加装了强化学习。

**Core structure:**
- It's all just statistical simulation circuits where the substrate is pre-training and RL is bolted on top.  
  这一切都只是统计模拟电路,其中基底是预训练,上面加装了强化学习。

**Structure tree:**
```
main clause: it's all just...circuits
relative clause: where the substrate is pre-training
coordinate clause: and then RL is bolted on top
parenthetical: so like statistics
```

**Grammar points:**
- **where引导定语从句** - 修饰circuits,说明其内部结构
- **被动语态 be bolted on** - bolt on表示'附加、加装',这里用被动表示RL被加在上面
- **口语化插入语** - so like statistics作为解释性插入,打断句子流畅度

### [25:31]
**Original:** I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans.

**Translation:** 我大多数时候在使用不同的框架或库或类似的东西时,它们仍然有从根本上为人类编写的文档。

**Core structure:**
- When I use frameworks, they still have docs written for humans.  
  当我使用框架时,它们仍然有为人类编写的文档。

**Structure tree:**
```
main clause: they still have docs
time clause: when I use frameworks or libraries
relative clause: that are written for humans
adverbial: most of the time
```

**Grammar points:**
- **时间状语从句嵌套** - when从句插在主句中间,增加理解难度
- **that引导定语从句** - 修饰docs,说明文档的特点

### [26:48]
**Original:** A lot of the work, a lot of the trouble was not even writing the code for Menuguen, it was deploying it in Vercel, because I had to work with all these different services and I had to string them up and I had to go to their settings and the menus and you know configure my DNS and it was just so annoying.

**Translation:** 很多工作,很多麻烦甚至不是为Menuguen编写代码,而是在Vercel上部署它,因为我必须使用所有这些不同的服务,我必须把它们串联起来,我必须进入它们的设置和菜单,你知道的,配置我的DNS,这真的太烦人了。

**Core structure:**
- The trouble was not writing code, it was deploying it because I had to work with different services.  
  麻烦不是写代码,而是部署它,因为我必须使用不同的服务。

**Structure tree:**
```
main clause: the trouble was...deploying it
not...but structure: not writing...it was deploying
causal clause: because I had to work with services
parallel clauses: and I had to...and I had to...and I had to
```

**Grammar points:**
- **not...but强调结构** - 强调真正的问题不是A而是B
- **并列结构重复** - 多个'I had to'并列,表达一系列必须完成的步骤
- **because引导原因状语从句**

### [28:05]
**Original:** It was something along the lines of, you can outsource your thinking but you can't outsource your understanding.

**Translation:** 大概是这样说的:你可以外包你的思考,但你不能外包你的理解。

**Core structure:**
- You can outsource thinking but you can't outsource understanding.  
  你可以外包思考,但不能外包理解。

**Structure tree:**
```
main clause: It was something along the lines of
quotation content: you can...but you can't
contrast structure: can vs. can't
```

**Grammar points:**
- **along the lines of** - 表示'大致是、类似于',引出不精确的引用
- **but连接对比** - 对比can和can't,强调thinking和understanding的区别

### [29:12]
**Original:** Ultimately these are tools to enhance understanding in a certain way, and this is still kind of like a bit of a bottleneck because then you can't direct the—you can't be a good director if you still—because the LLMs certainly don't excel at understanding, you still are uniquely in charge of that.

**Translation:** 最终这些是以某种方式增强理解的工具,而这仍然有点像一个瓶颈,因为你不能指导——如果你仍然——因为大语言模型当然不擅长理解,你仍然是唯一负责这个的人。

**Core structure:**
- These are tools to enhance understanding, and this is a bottleneck because you can't be a good director if LLMs don't excel at understanding.  
  这些是增强理解的工具,这是个瓶颈,因为如果大语言模型不擅长理解,你就不能成为好的指导者。

**Structure tree:**
```
main clause 1: these are tools
main clause 2: this is a bottleneck
causal clause: because you can't be a director
conditional clause: if you still...
nested causal: because LLMs don't excel
```

**Grammar points:**
- **多层嵌套从句** - because从句中嵌套if从句,再嵌套另一个because从句,层次复杂
- **句子中断和重启** - you can't direct the—表示说话者中断思路重新组织语言,口语特征明显
- **be in charge of** - 表示'负责、掌管'