Podcast

Andrej Karpathy: From Vibe Coding to Agentic Engineering

Sequoia Capital / 30 min / done

195 transcript segments

00:02A

We're so excited for our very first special guest. He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI. He actually helped co-found OpenAI right inside of this office. Was the one who actually got Autopilot working at Tesla back in the day, and he has a rare gift of making the most complex technical shifts feel both accessible and inevitable.

我们非常激动地迎来第一位特别嘉宾。他参与构建了现代 AI,然后解释现代 AI,偶尔还会给现代 AI 重新命名。他实际上就在这间办公室里参与联合创立了 OpenAI,也是当年让 Tesla 的 Autopilot 真正运转起来的那个人,而且他有一种罕见的天赋,能把最复杂的技术变革讲得既易懂又显得理所当然。

00:30A

You all know him for having coined the term vibe coding last year, but just in the last few months, he said something even more startling. That he's never felt more behind as a programmer. That's where we're starting today. Thank you, Andrej, for joining us.

大家都知道他在去年创造了「vibe coding」这个词,但就在最近几个月,他说了一句更令人震惊的话:他从未像现在这样觉得自己作为程序员落后了。这就是我们今天的起点。感谢 Andrej 加入我们。

00:44B

Yeah. Hello. Excited to be here and to kick us off.

你好。很高兴来到这里,为我们开个头。

00:47A

Okay. So, just a couple months ago, you said that you've never felt more behind as a programmer. That's startling to hear from you of all people. Can you help us unpack that? Was that feeling

好的。就在几个月前,你说你从未像现在这样觉得自己作为程序员落后了。从你这样的人口中听到这话真是令人震惊。能帮我们理解一下吗?那种感觉是——

00:57A

Exhilarating or unsettling?

令人兴奋还是令人不安?

01:00B

Yeah, a mixture of both for sure. Well, first of all, I guess like as many of you, I've been using agentic tools like LLM code, adjacent things, for a while, maybe over the last year as it came out and it was very good at, you know, chunks of code and sometimes it would mess up and you have to edit them and it was kind of helpful. And then I would say December was this clear point where for me I was on a break so I had a bit more time.

两者都有。首先,我想和你们很多人一样,我在过去一年左右一直在使用 agentic 工具,比如 LLM 代码相关的东西,它在生成代码片段方面很不错,有时会出错需要你去编辑,算是有帮助。然后我觉得 12 月是一个明确的转折点,因为我当时在休假,有更多时间。

01:22B

I think many other people were similar and I just started to notice that with the latest models the chunks just came out fine and then I kept asking for more and it just came out fine and then I can't remember the last time I corrected it and then I just, you know, trusted the system more and more and then I was vibe coding. [laughter] And so it was kind of a—I do think that it was a very stark transition.

我想很多人也有类似经历,我开始注意到用最新的模型时,代码片段就这么完美地生成出来了,然后我不断要求更多,它还是完美生成,我已经记不清上次纠正它是什么时候了,然后我就越来越信任这个系统,然后我就在 vibe coding 了。(笑声)所以这确实是一个非常明显的转变。

01:43A

I think that a lot of people actually—I tried to stress this on Twitter, or X—because I think a lot of people experienced AI last year as a ChatGPT-adjacent thing.

我觉得很多人实际上——我在 Twitter 或者说 X 上试图强调这一点——因为我觉得很多人去年体验 AI 还是把它当作 ChatGPT 那样的东西。

01:52A

But you really had to look again, and you had to look as of December, because things have changed fundamentally, and especially on this agentic coherent workflow that really started to actually work.

但你真的需要重新审视,而且要从 12 月开始审视,因为事情已经发生了根本性的变化,尤其是在这种 agentic 的连贯工作流上,它真的开始能用了。

02:04A

And so I would say that, yeah, it was just that realization that really had me go down this whole rabbit hole of just, you know, infinite side projects.

所以我想说,正是这个认识让我掉进了这个兔子洞,开始做无数个副项目。

02:16A

My side projects folder is extremely full with lots of random things, and just V0 coding all the time.

我的副项目文件夹里塞满了各种随机的东西,一直在用 V0 编程。

02:21A

So yeah, that kind of happened in December, I would say, and I was looking at the repercussions of that since.

所以这种情况发生在 12 月,我从那以后一直在观察它带来的影响。

02:28B

You've talked a lot about this idea of LLMs as a new computer, that it isn't just better software, it's a whole—

你经常谈到 LLM 作为一种新型计算机的想法,它不只是更好的软件,而是一个全新的——

02:35A

New computing paradigm. And software 1.0 was explicit rules, software 2.0 was learned weights, software 3.0 is this.

新的计算范式。Software 1.0 是显式规则,Software 2.0 是学习到的权重,Software 3.0 就是这个。

02:43A

If that's actually true, what does a team build differently the day they actually believe this?

如果这确实是真的,那么一个团队在真正相信这一点的那天,会以什么不同的方式来构建?

02:50B

Right? So yeah, exactly. So software 1.0, I'm writing code, software 2.0, I'm actually programming by creating datasets and training neural networks.

对。所以 Software 1.0,我在写代码;Software 2.0,我实际上是通过创建数据集和训练神经网络来编程。

02:59B

So the programming is kind of like arranging datasets and maybe some objectives and neural network architectures.

所以编程就像是在组织数据集,可能还有一些目标函数和神经网络架构。

03:03B

And then what happened is that basically if you train one of these GPT models or LLMs on a sufficiently large set of tasks implicitly, because by training on the internet you have to multitask all the things that are in the dataset.

然后发生的事情是,如果你在足够大的任务集上训练这些 GPT 模型或 LLM,这些任务是隐式的,因为在互联网上训练就必须多任务处理数据集中的所有内容。

03:15B

These actually become kind of like a programmable computer in a certain sense.

这些模型实际上在某种意义上变成了一种可编程的计算机。

03:20B

So software 3.0 is kind of about, you know, your programming now turns to prompting and what's in the

所以 Software 3.0 就是,你的编程现在变成了提示,而上下文窗口中的内容——

03:25A

Context window is your lever over the interpreter that is the LLM that is kind of like interpreting your context and performing computation in the digital information space.

就是你控制解释器的杠杆,这个解释器就是 LLM,它在解释你的上下文并在数字信息空间中执行计算。

03:34A

So I guess yeah that's kind of the transition and I think there's a few examples of that really drove it home for me and maybe that might be instructive.

所以我想这就是这种转变,我觉得有几个例子真正让我明白了这一点,也许会有启发。

03:42A

So for example when OpenClaw came out, when you want to install OpenClaw you would expect that normally this is a bash script like a shell script.

比如当 OpenClaw 出来的时候,你想安装 OpenClaw,通常你会期待这是一个 bash 脚本,就是一个 shell 脚本。

03:52A

So run the shell script to install OpenClaw.

所以运行这个 shell 脚本来安装 OpenClaw。

03:54A

But the thing is that in order to target lots of different platforms and lots of different types of computers you might run OpenClaw.

但问题是,为了支持很多不同的平台和很多不同类型的计算机,你可能会运行 OpenClaw——

04:01A

These shell scripts usually balloon up and become extremely complex.

这些 shell 脚本通常会膨胀得非常复杂。

04:05A

But the thing is you're still stuck in a software 1.0 universe of wanting to write the code.

但问题是你仍然困在 Software 1.0 的思维模式里,还想着自己去写代码。

04:07A

And actually the OpenClaw installation is a copy paste of a bunch of text that you're

而实际上 OpenClaw 的安装就是一段文本,你复制粘贴给你的 agent 就行。

04:13A

supposed to give to your agent. So basically it's a little script of, you know, copy-paste this and give it to your agent and it will install OpenClaw.

基本上就是一个小脚本,你复制粘贴这段文本给你的 agent,它就会安装 OpenClaw。

04:20A

And the reason this is a lot more powerful is you're working now in the Software 3.0 paradigm where you don't have to precisely spell out all the individual details of that setup.

这种方式强大得多的原因是,你现在是在 Software 3.0 范式下工作,不需要精确地写出设置的每个细节。

04:29A

The agent has its own intelligence that it packages up and then it follows the instructions and it looks at your environment, your computer, and it performs intelligent actions to make things work and it debugs things in the loop and it's just so much more powerful, right?

Agent 本身有智能,它会打包这些智能,然后按照指令执行,查看你的环境和电脑,智能地执行操作让事情运转起来,还能在循环中调试问题,这强大太多了。

04:42A

So I think that's a very different way of thinking about it—just what is the piece of text to copy-paste to your agent?

所以我觉得这是一种完全不同的思考方式——就是想清楚要复制粘贴给 agent 的那段文本是什么。

04:47A

That's the programming paradigm.

这就是新的编程范式。

04:48A

Now I think one more example that comes to mind that is even more extreme than that is when I was building Menugenen.

我想到另一个更极端的例子,是我在做 Menugenen 的时候。

04:56A

Menu Gen is this idea where you come to a restaurant, they give you a menu. There's no pictures usually. So I don't know what any of these things are. Usually like 30% of the things I have no idea what they are, 50%.

Menu Gen 的想法是这样的:你去餐厅,他们给你菜单,通常没有图片。所以我不知道这些菜是什么样的。通常有 30% 的菜我完全不知道是什么,甚至 50%。

05:07A

So I wanted to take a photo of the restaurant menu and to get pictures of what those things might look like in a generic sense.

所以我想拍张餐厅菜单的照片,然后获取这些菜品大概长什么样的通用图片。

05:16A

And so I built, I V0'd this app that basically lets you upload a photo and it does all this stuff and it runs on Vercel and it basically re-renders the menu and it gives you like all the items and it gives you a picture that it uses an image generator for to basically OCR all the different titles, use the image generator to get pictures of them and then shows it to you.

于是我用 V0 做了这个应用,基本上就是让你上传照片,它会做所有这些处理,运行在 Vercel 上,重新渲染菜单,给你所有菜品,用图像生成器 OCR 识别所有菜名,生成对应的图片,然后展示给你。

05:37A

And then I saw the Software 3.0 version of this which blew my mind, which is literally just take your photo, give it to Gemini and say use NanoBanana to overlay the things onto the menu. And NanoBanana

然后我看到了 Software 3.0 版本的做法,让我震惊,就是直接拍照,给 Gemini,让它用 NanoBanana 把图片叠加到菜单上。

05:51A

Basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels—it rendered the different things in the menu—and this blew my mind because actually all of my menu gen is spurious.

NanoBanana 基本上返回的图像就是我拍的那张菜单照片,但它实际上在像素层面渲染了菜单里不同菜品的样子——这让我震惊,因为我的整个 menu gen 其实是多余的。

06:04A

It's working in the old paradigm that app shouldn't exist, and yeah, the software 3.0 paradigm is a lot more kind of raw.

它还在用旧范式工作,那个应用根本不应该存在,而 Software 3.0 范式要更原始直接得多。

06:11A

It just—your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image, and there's no need to have any of the app in between.

就是你的神经网络在做越来越多的工作,你的 prompt 或上下文就是那张图片,输出也是图片,中间不需要任何应用。

06:21A

So I think that people have to kind of like reframe—you know, not to work in existing paradigm of what things existed—and just think about it as a speed up of what exists.

所以我觉得人们需要重新思考——不要用现有事物的既有范式去思考——不要只把它当作现有事物的加速。

06:31A

It's actually like new things are available now.

实际上是有新的可能性出现了。

06:33A

And going back to your programming question, it's not even—I think that's also an example of working in the old mindset—because it's not just about programming and programming becoming...

回到你关于编程的问题,我觉得这也是用旧思维在思考的例子——因为这不仅仅是关于编程,不仅仅是编程变得……

06:42A

Faster, this is more general information processing that is automatable now, so it's not just even about code.

更快,这是更广泛的信息处理现在可以自动化了,所以甚至不只是关于代码。

06:49A

So previous code worked over kind of like structured data, right, and you write code over structured data.

以前的代码是处理结构化数据的,对吧,你针对结构化数据写代码。

06:53A

But like for example with my LLM knowledge base project, basically you get LLMs to create wikis for your organization or for you in person, etc.

但比如我的 LLM knowledge base 项目,基本上就是让 LLM 为你的组织或个人创建 wiki 等等。

07:01A

This is not even a program, this is not something that could exist before because there was no code that would create a knowledge base based on a bunch of facts.

这甚至不是一个程序,这是以前不可能存在的东西,因为没有代码能基于一堆事实创建知识库。

07:09A

But now you can just take these documents and basically recompile them in a different way and reorder them and create something that is new and interesting as a reframing of the data.

但现在你可以直接拿这些文档,用不同的方式重新编译,重新排序,创造出新的、有趣的数据重构。

07:19A

And so these are new things that weren't possible, and so I think this is something that I keep trying to get back to, as to not only what can we do that existed that is faster now, but I think there's new opportunities of just...

所以这些是以前不可能的新事物,我一直想强调的就是这点,不仅仅是我们能把已有的事情做得更快,而是有新的机会出现了……

07:33A

Things that couldn't be possible before, and I almost think that that's more exciting.

那些以前不可能实现的事情，我几乎觉得这才是更令人兴奋的部分。

07:37B

I love the menu generation progression and dichotomy that you laid out, and I think even I'm sure many folks here followed your own progression of programming from last October to early January, February this year.

我很喜欢你刚才描述的菜单生成的演进过程和对比，而且我相信在座的很多人都关注了你自己从去年10月到今年1、2月份在编程方面的进步历程。

07:48B

If you extrapolate that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era? What will look completely obvious in hindsight that is still mostly unbuilt today?

如果把这个趋势继续推演下去，什么会成为2026年的等价物——就像90年代建网站、2010年代做移动应用、上一个云时代构建SaaS那样？什么东西事后看来会显得理所当然，但现在还基本没被开发出来？

08:08A

Well, going with the example of menu, I guess, so a lot of this code shouldn't exist and it's just neural networks doing most of the work.

嗯，还是拿菜单这个例子来说，我觉得很多代码其实不应该存在，大部分工作应该由神经网络来完成。

08:15A

I do think that the extrapolation looks very weird because you could basically imagine—I don't—yeah, so you could imagine completely neural computers in a certain sense.

我确实认为这种推演看起来会很奇怪，因为你基本上可以想象——我不知道——对，你可以想象某种意义上完全由神经网络驱动的计算机。

08:25A

Sense you feed raw videos like imagine a device that takes raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is kind of like, you know, unique for that moment in a certain sense. And I kind of feel like in the early days of computing actually people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets, and in the 50s and 60s it was not really obvious which way we'd go. And of course we went down the calculator path and ended up building classical computing, and then neural nets are currently running virtualized on existing computers. But you could imagine, I think that a lot of this will flip and that the neural net becomes kind of like the host process and the CPUs become kind of like the co-processor. So we saw the diagram of, you know, intelligence compute of neural networks is going to take over and become the dominant

也就是说，你输入原始视频，想象一个设备接收原始视频或音频，输入到一个神经网络中，然后用扩散模型渲染出一个UI界面，这个界面在某种意义上是为那个特定时刻定制的。我有点觉得，在计算机发展早期，人们其实对计算机应该长得像计算器还是像神经网络是有些困惑的，在五六十年代这个方向并不明确。当然我们最后走了计算器那条路，建立了经典计算体系，然后神经网络目前是在现有计算机上虚拟化运行的。但你可以想象，我认为很多东西会翻转过来，神经网络会成为主进程，而CPU会变成协处理器。我们看到过那个图表，神经网络的智能计算将会接管并成为主导。

09:12A

Spend of flops, so you could imagine something really weird and foreign when neural nets are doing most of the heavy lifting.

浮点运算的开销，所以你可以想象当神经网络承担大部分繁重工作时，会出现一些非常奇怪和陌生的东西。

09:18A

They're using tool use as this, you know, historical appendage for some kinds of deterministic tasks.

它们会把工具使用当作某种历史遗留的附属功能，用来处理某些确定性任务。

09:24A

But what's really running the show is these neural nets that are in a certain way.

但真正主导一切的是这些神经网络，从某种角度来说。

09:29A

So you can imagine something extremely foreign as the extrapolation, but I think we're going to probably get there sort of piece by piece.

所以你可以想象推演的结果会是极其陌生的东西，但我觉得我们可能会一步步地到达那里。

09:36A

And I don't—yeah, that progression is TBD, I would say.

而且我不——对，这个演进过程还有待观察，我只能这么说。

09:40B

[snorts]

（轻笑）

09:41B

I'd like to talk a little bit about this concept of verifiability, the fact that AI will automate faster and more easily domains where the output can be verified.

我想聊聊可验证性这个概念，也就是AI会更快更容易地自动化那些输出可以被验证的领域。

09:49B

If that framework is right, what work is about to move much faster than people realize, and what professions do we have that people actually think are safe but that are—

如果这个框架是对的，什么工作会比人们预期的发展得快得多，还有哪些职业人们以为是安全的，但实际上——

10:00A

Actually highly verifiable?

其实是高度可验证的？

10:02B

Yes. So I spent some time writing about verifiability and basically traditional computers can easily automate what you can specify in code, and this latest round of LLMs can easily automate what you can verify in a certain sense, because the way this works is that when frontier labs are training these LLMs, these are giant reinforcement learning environments.

是的。我花了些时间研究可验证性，基本上传统计算机可以轻松自动化你能用代码指定的东西，而最新一轮的LLM可以轻松自动化你能验证的东西，从某种意义上说，因为这些前沿实验室训练LLM的方式是，这些是巨大的强化学习环境。

10:24B

So they are given verification rewards and then because of the way that these models are trained, they end up progressing and creating these jagged entities that really peak in capability in verifiable domains like math and code and adjacent areas, and kind of stagnate and are a little bit rough around the edges when things are not in that space.

它们会获得验证奖励，然后由于这些模型的训练方式，它们最终会进化并形成这些参差不齐的实体，在数学和代码等可验证领域的能力达到峰值，在相邻领域也表现不错，但在不属于这个范围的事情上就有点停滞不前，表现得有些粗糙。

10:44B

So I think the reason I wrote about verifiability is I'm trying to understand why these things are so

所以我写关于可验证性的原因是，我想理解为什么这些东西如此

10:49A

Jagged, and some of it has to do with how the labs train the models, but I think some of it also has to do with the focus of the labs and what they happen to put into the data distribution.

参差不齐，部分原因与实验室如何训练模型有关，但我认为部分原因也与实验室的关注点以及它们恰好放入数据分布中的内容有关。

10:58A

Because some things basically are significantly more valuable in the economy and end up creating more environments because the labs wanted to work in those settings.

因为有些东西在经济中明显更有价值，最终会创造更多环境，因为实验室希望在这些场景中发挥作用。

11:05A

So I think code is a good example of that.

所以我认为代码就是一个很好的例子。

11:08A

There's probably lots of verifiable environments they could think about that happen not to make it into the mix because they're just not that useful to have the capability around.

可能有很多可验证的环境他们可以考虑，但恰好没有被纳入训练组合中，因为拥有这些能力并不是那么有用。

11:13A

But I think to me the big, I guess like the big mystery is, the favorite example for a while was how many letters are in a strawberry, and the models would famously get this wrong, and it's an example of jaggedness.

但我觉得对我来说最大的，我想说最大的谜团是，有一阵子最喜欢举的例子是strawberry这个词里有多少个字母，模型会出名地答错这个问题，这是参差不齐的一个例子。

11:27A

The models now patch this I think, but the new one is, I want to go to a car wash to wash my car and it's 50 meters away, should I

现在的模型我想已经修补了这个问题，但新的例子是，我想去洗车，洗车店离我50米远，我应该

11:34A

Drive or should I walk? And state-of-the-art models today will tell you to walk because it's so close.

开车去还是走路去？而今天最先进的模型会告诉你走路去，因为太近了。

11:40A

How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000 line codebase or find zero-day vulnerabilities and yet tells me to walk to this car wash?

怎么可能最先进的 Opus 4.7 一边能重构 10 万行代码库或者发现零日漏洞,一边却让我走路去这个洗车店?

11:52A

This is insane. And to whatever extent these models remain jagged, it's an indication that number one, maybe something's slightly off, or number two, you need to actually be in the loop a little bit and you need to treat them as tools and you do have to kind of stay in touch with what they're doing.

这太离谱了。这些模型在多大程度上仍然表现得参差不齐,就说明:第一,可能有些地方不太对劲;第二,你确实需要参与进来一点,需要把它们当作工具来对待,而且你必须对它们正在做的事情保持关注。

12:11A

And so I think all of my writing, long story short, about verifiability is just trying to understand why these things are jagged. Is there any pattern to it?

所以长话短说,我关于可验证性的所有写作,其实就是在试图理解为什么这些模型会表现得参差不齐。这里面有什么规律吗?

12:20A

And I think it's some kind of a combination of verifiable plus labs care. Maybe one more anecdote that is instructive is from GPT-3.5 to GPT-4, people noticed that

我认为这是可验证性加上实验室重视程度的某种组合。还有一个很有启发性的例子,就是从 GPT-3.5 到 GPT-4,人们注意到

12:31A

Chess improved a lot and I think a lot of people thought, oh well, it's just a progression of the capabilities, but actually it's more that—I think this is public information, I think I saw it on the internet—a huge amount of chess data made it into the pre-training set, and just because it's in a data distribution, basically the model improved a lot more than it would just by default.

国际象棋能力提升了很多,我想很多人以为,哦这只是能力的自然进步,但实际上更多是因为——我觉得这是公开信息,我在网上看到的——大量国际象棋数据被加入了预训练集,仅仅因为它在数据分布里,模型的提升就比默认情况下大得多。

12:50A

So someone at OpenAI decided to add this data and now you have a capability that just peaked a lot more.

所以是 OpenAI 的某个人决定添加这些数据,然后你就有了一个突然大幅提升的能力。

12:56A

And so that's why I think I'm stressing this dimension of it, as we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix.

所以这就是为什么我强调这个维度,因为我们在某种程度上受制于实验室正在做的事情,受制于他们碰巧放进去的东西。

13:04A

And you have to actually explore this thing that they give you that has no manual.

而你必须真正去探索他们给你的这个没有说明书的东西。

13:08A

And it works in certain settings, but maybe not in some settings.

它在某些场景下有效,但在另一些场景下可能不行。

13:11A

And you have to kind of explore it a little bit.

你必须稍微探索一下。

13:13A

And if you're in the circuits that were part of the RL, you fly. And if you're in the—

如果你在那些属于强化学习一部分的回路里,你就能飞起来。但如果你在——

13:19A

Circuits that are out of the data distribution, you're going to struggle and you have to kind of figure out which circuits you're in in your application. And if you're not in the circuits, then you have to really look at fine-tuning and doing some of your own work because it's not going to necessarily come out of the LLM out of the box.

那些不在数据分布内的回路里,你就会很吃力,你必须搞清楚你的应用在哪些回路里。如果你不在这些回路里,那你就得认真考虑微调和做一些自己的工作,因为它不一定能直接从大语言模型开箱即用地得到。

13:36B

I'd love to come back to the concept of jagged intelligence in a little bit. If you are a founder today and thinking about building a company, you are trying to solve a problem that you think is tractable, something that is a domain that is verifiable, but you look around and you think, "Oh my gosh, well, the labs have really started getting to escape velocity in the ones that seem most obvious, math, coding, and others." What would your advice be to the founders in the audience?

我很想稍后再回到参差不齐的智能这个概念。如果你今天是一位创始人,正在考虑创建一家公司,你试图解决一个你认为可行的问题,一个可验证的领域,但你环顾四周会想:「天哪,实验室在那些看起来最明显的领域——数学、编程等等——真的开始达到逃逸速度了。」你会给在座的创始人什么建议?

14:08A

So I think maybe that comes to the

我想这可能回到了

14:10A

Previous question of, I do think that verifiability, because it, um, let me think.

之前的问题,我确实认为可验证性,因为它,嗯,让我想想。

14:14A

So verifiability makes something tractable in the current paradigm because you can throw a huge amount of RL at it.

可验证性使得某件事在当前范式下变得可行,因为你可以对它投入大量强化学习。

14:20A

Um, so maybe one way to see it is that, uh, that remains true even if the labs are not focusing on it directly.

所以也许可以这样看,即使实验室没有直接关注它,这一点仍然成立。

14:26A

So if you are in a verifiable setting where you could create these RL environments or examples, then that actually sets you up to potentially do your own fine-tuning and you might benefit from that.

所以如果你处在一个可验证的环境中,可以创建这些强化学习环境或示例,那实际上就为你自己做微调做好了准备,你可能会从中受益。

14:36A

But that is fundamentally technology that just works.

但这从根本上说是一种确实有效的技术。

14:38A

You can pull a lever if you have a huge amount of diverse datasets of RL environments, etc.

如果你有大量多样化的强化学习环境数据集等等,你就可以拉动这个杠杆。

14:41A

Uh, you can use your favorite fine-tuning framework and, um, and, uh, pull the lever and get something that actually, uh, works pretty well.

你可以使用你喜欢的微调框架,然后拉动杠杆,得到一个实际上效果相当好的东西。

14:49A

So, um, I don't know what the examples of this might be.

所以,我不知道具体例子可能是什么。

14:51A

Um, but I do think there are some very valuable, uh, reinforcement learning environments that people could think of that I think are...

但我确实认为有一些非常有价值的强化学习环境,人们可以考虑,我认为它们...

14:59A

Not part of the... Yeah, I don't want to give away the answer, but there is one domain that I think is very... Oh, okay. Sorry, I don't mean to vague post on the stage, but there are some examples of this.

不属于...是的,我不想在台上说得太含糊,但确实有一些这样的例子。

15:09B

On the flip side, what do you think still feels automatable only from a distance?

反过来说,你认为什么东西现在看起来可以自动化,但实际上只是远看如此?

15:14A

I do think that ultimately almost everything can be made verifiable to some extent, some things easier than others. Because even for things like writing or so on, you can imagine having a council of LLM judges and probably get something reasonable out of this kind of an approach.

我确实认为,最终几乎所有事情都可以在某种程度上被验证,只是有些事情比其他事情更容易。因为即使是写作这类任务,你也可以想象让一组 LLM 评委来评判,通过这种方法可能就能得到相当合理的结果。

15:33A

So it's more about what's easy or hard. So I do think that ultimately... yeah, I think...

所以更多是关于什么容易、什么困难的问题。我确实认为最终……是的,我觉得……

15:42B

Everything? [laughter]

所有事情?(笑)

15:43A

Everything is automatable.

所有事情都可以自动化。

15:45B

Amazing. Okay. So last year you coined the term vibe coding and today...

太厉害了。好的。去年你创造了「vibe coding」这个词,而今天……

15:49A

We're in a world that feels a little bit more serious, more agentic engineering. What do you think is the difference between the two and what would you actually call what we're in today?

我们所处的世界感觉更严肃了一些,更偏向 agentic engineering(智能体工程)。你认为这两者之间有什么区别?你会如何称呼我们今天所处的阶段?

15:57B

Uh, yeah. So I would say vibe coding is about raising the floor for everyone in terms of what they can do in software.

嗯,是的。我会说 vibe coding 是关于提升每个人在软件开发方面的能力下限。

16:03B

So the floor rises, everyone can vibe code anything and that's amazing, incredible.

也就是说,下限提高了,每个人都可以用 vibe coding 做任何事情,这很棒,非常了不起。

16:06B

But then I would say agentic engineering is about preserving the quality bar of what existed before in professional software.

但我会说 agentic engineering 是关于保持专业软件开发中原有的质量标准。

16:11B

So you're not allowed to introduce vulnerabilities due to vibe coding. You are, you're still responsible for your software just as before, but can you go faster?

你不能因为 vibe coding 就引入安全漏洞。你仍然要像以前一样对你的软件负责,但问题是你能不能更快?

16:22B

And spoiler is you can, but how do you, how do you do that properly?

剧透一下,答案是可以,但你要如何正确地做到这一点?

16:24B

And so to me agentic engineering, when I call it that, because I do think it's kind of like an engineering discipline.

所以对我来说,我之所以称之为 agentic engineering,是因为我确实认为它是一种工程学科。

16:29B

You have these agents which are these like spiky entities. They're a bit fallible, a little

你有这些智能体,它们是这种有点棱角分明的实体。它们有点容易出错,有点……

16:33A

A bit stochastic, but they are extremely powerful. How do you coordinate them to go faster without sacrificing your quality bar and doing that well and correctly is the realm of agentic engineering. So I kind of see them as different, like one is about maybe raising the floor and the other is about extrapolating. And what I'm seeing, I think, is there is a very high ceiling on agentic engineer capability. And you know, people used to talk about the 10x engineer previously. I think that this is magnified a lot more. 10x is not the speed up you gain. And I think it does seem to me like people who are very good at this peak a lot more than 10x from my perspective right now.

有点随机性,但它们极其强大。如何协调它们来加快速度而不牺牲质量标准,并且做得好、做得正确,这就是 agentic engineering 的领域。所以我认为它们是不同的,一个是关于提升下限,另一个是关于向上延伸。而我看到的是,agentic engineer 的能力上限非常高。你知道,人们以前常说 10 倍工程师。我认为这个倍数被大大放大了。你获得的加速不止 10 倍。从我现在的角度来看,那些非常擅长这个的人的峰值能力远超 10 倍。

17:18B

I really like that framing. One thing that when Sam Altman came to AIN last year, one memorable thing he said was that people of different generations use ChatGPT differently. So if you're in your

我很喜欢这个框架。去年 Sam Altman 来 AIN 时说的一件令人印象深刻的事是,不同年代的人使用 ChatGPT 的方式不同。所以如果你……

17:29A

In your 30s, you use it as a Google search replacement. But if you're in your teens, TikTok is your gateway to the internet.

如果你三十多岁,你把它当作 Google 搜索的替代品。但如果你十几岁,TikTok 才是你通往互联网的入口。

17:35A

What is the parallel here in coding today? If we were to watch two people code using OpenAI, Claude, Codex, one you'd consider mediocre at it and one you would consider fully AI native, how would you describe the difference?

那么在今天的编程中,类似的情况是什么?如果我们观察两个人使用 OpenAI、Claude、Codex 编程,一个你认为水平一般,一个你认为完全是 AI 原生的,你会如何描述这种差异?

17:51B

[clears throat]

(清嗓子)

17:51B

I mean, I think it's just trying to get the most out of the tools that are available, utilizing all of their features, investing into your own kind of setup.

我的意思是,我认为就是尽可能充分利用可用的工具,使用它们的所有功能,投入到你自己的设置中。

17:59B

So just like previously, all the engineers are used to basically getting the most out of the tools you use, whether it's Vim or VS Code, or now it's, you know, Claude Code or Codex or so on.

就像以前一样,所有工程师习惯于充分利用你使用的工具,无论是 Vim 还是 VS Code,或者现在是 Claude Code 或 Codex 等等。

18:09B

So just investing into your setup and utilizing a lot of the tools that are available to you. And I think it just kind of looks like that.

所以就是投入到你的设置中,并充分利用你可用的各种工具。我认为大概就是这样。

18:18B

I do think that maybe

我确实认为也许……

18:23A

A related thought is a lot of people are maybe hiring for this right, because they want to hire strong agentic engineers.

一个相关的想法是,很多人可能正在为此招聘,因为他们想招聘强大的 agentic engineer。

18:31A

I do think that what I'm seeing is that most people have still not refactored their hiring process for agentic engineer capability, right? Like if you're giving out puzzles to solve, this is still the old paradigm.

我确实认为,我看到的是,大多数人仍然没有针对 agentic engineer 能力重构他们的招聘流程,对吧?如果你还在出谜题让人解决,这仍然是旧范式。

18:46A

I would say that hiring has to look like: give me a really big project and see someone implement that big project. Like let's write, say, a Twitter clone for agents, and then make it really good, make it really secure, and then have some agents simulate some activity on this Twitter.

我会说招聘应该是这样的:给我一个真正大的项目,看某人实现那个大项目。比如说,写一个面向智能体的 Twitter 克隆,然后把它做得非常好,非常安全,然后让一些智能体在这个 Twitter 上模拟一些活动。

19:03A

And then I'm going to use 10 Claude 3.5 Sonnet or X AI to try to break this website that you deployed, and they're going to try to basically break it, and they should not be able to break it.

然后我会用 10 个 Claude 3.5 Sonnet 或者 X AI 来尝试攻破你部署的这个网站,让它们尝试破解,但它们应该破解不了才对。

19:16A

And so maybe it looks like that, right? And so yeah, watching people in that setting...

可能就是这样的场景,对吧?所以在那种环境下观察人们的表现……

19:21A

Building bigger projects and utilizing the tooling is maybe what I would look at for the most part.

构建更大的项目并利用这些工具,这可能是我主要会关注的方面。

19:29B

And as agents do more, what human skill do you think becomes more valuable, not less?

随着 agent 能做的事情越来越多,你认为哪些人类技能会变得更有价值,而不是更不重要?

19:34A

So yeah, it's a good question. I think, well, right now the answer is that the agents are kind of like these intern entities, right? So it's remarkable. You basically still have to be in charge of the aesthetics, the judgment, the taste, and a little bit of oversight. Maybe one of my favorite examples of like the weirdness of agents is, for menu gen, you sign up with a Google account, but you purchase credits using a Stripe account, and both of them have email addresses. And my agent actually tried to basically, like when you purchase credits, it assigned it using the email address from Stripe to the Google email address, like...

这是个好问题。我觉得,目前的答案是这些 agent 有点像实习生一样的存在,对吧?很神奇的是,你基本上还是要负责审美、判断、品味,以及一些监督工作。我最喜欢的一个例子,能体现 agent 的怪异之处,就是在 menu gen 中,你用 Google 账号注册,但用 Stripe 账号购买积分,两者都有邮箱地址。我的 agent 实际上尝试在你购买积分时,用 Stripe 的邮箱地址去匹配 Google 的邮箱地址,就像……

20:15A

There wasn't a persistent user ID for people. It was trying to match up the email addresses, but you could use different email addresses for your Stripe and your Google and basically would not associate the funds.

用户没有持久化的用户 ID。它试图通过邮箱地址来匹配,但你的 Stripe 和 Google 可以用不同的邮箱,这样就无法关联资金了。

20:26A

And so this is the kind of thing that these agents still will make mistakes about, is like why would you use email addresses to try to cross-correlate the funds? They can be arbitrary. You can use different emails, etc. Like this is such a weird thing to do.

所以这就是这些 agent 仍然会犯的错误,比如为什么要用邮箱地址来交叉关联资金?邮箱是可以任意设置的,你可以用不同的邮箱等等。这种做法真的很奇怪。

20:36A

So I think people have to be in charge of this spec, this plan.

所以我认为人类必须负责这个规格说明、这个计划。

20:40A

And I actually don't even like the plan mode. I would—I mean obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed and maybe it's basically the docs, and then get the agents to write them and you're in charge of the oversight and the top level categories, but the agents are—

其实我甚至不太喜欢计划模式。我的意思是,它显然很有用,但我觉得这里有更通用的东西,就是你必须和你的 agent 一起设计一个非常详细的规格说明,可能基本上就是文档,然后让 agent 去编写它们,你负责监督和顶层分类,但 agent 在做——

21:00A

Doing a lot of the under the hood, and so I think you're not caring about some of the details.

很多底层的工作,所以你不用关心某些细节。

21:04A

So as an example, also with arrays or tensors in neural networks, there's a ton of details between PyTorch and NumPy and all the different like pandas and so on for all the different little API details.

举个例子,在神经网络中处理数组或张量时,PyTorch 和 NumPy 以及 pandas 等等之间有大量的 API 细节差异。

21:17A

And I already forgot about the keep dims versus keep dim or whether it's dim or axis or reshape or permute or transpose.

我已经忘了是 keep dims 还是 keep dim,或者是 dim 还是 axis,是 reshape 还是 permute 还是 transpose。

21:22A

I don't remember this stuff anymore, right?

我已经不记得这些东西了,对吧?

21:24A

Because you don't have to. This is the kind of details that are handled by the intern because they have very good recall. But you still have to know, for example, that there's an underlying tensor, there's an underlying view, and then you can manipulate view of the same storage or you can have different storage which would be less efficient. And so you still have to have an understanding of what this stuff is doing and some of the fundamentals so that you're not

因为你不需要记。这些细节由实习生处理,因为它们的记忆力很好。但你仍然需要知道,比如底层有一个张量,有一个底层视图,然后你可以操作同一存储的视图,或者你可以有不同的存储,那样效率会更低。所以你仍然需要理解这些东西在做什么,以及一些基本原理,这样你就不会——

21:45A

Copying memory around unnecessarily and so on, but the details of the APIs are now handed off, so you're in charge of the taste, the engineering, the design, and that it makes sense and that you're asking for the right things and that you're saying that, okay, these have to be unique user IDs that we're going to tie everything to. And so you're doing some of the design and development and the engineers are doing the fill in the blanks, and that's currently kind of like where we are, and I think that's what everyone of course is seeing, I think, right now.

不必要地复制内存等等,但 API 的细节现在交给它们了,所以你负责品味、工程设计、整体设计,确保它有意义,确保你要求的东西是对的,确保你说的是,好的,这些必须是唯一的用户 ID,我们要把所有东西都绑定到它上面。所以你在做一些设计和开发工作,而工程师在填空,这就是我们目前的状态,我想这也是大家现在都看到的。

22:13B

Do you think there's a chance that this taste and judgment matters less over time, or will the ceiling just keep rising?

你觉得随着时间推移,这种品味和判断会变得不那么重要吗,还是说天花板会一直上升?

22:21A

Yeah, it's a good question. I would—okay, I mean, I'm hoping that it improves. I think probably the reason it doesn't improve right now is, again, it's not part of the RL. There's probably no

这是个好问题。我希望它能改进。我觉得它现在没有改进的原因可能是,它不是强化学习的一部分。可能没有——

22:33A

Aesthetics cost or reward, or it's not good enough or something like that.

审美成本或奖励,或者还不够好,诸如此类。

22:39A

I do think that when you actually look at the code, sometimes I get a little bit of a heart attack because it's not like super amazing code necessarily all the time, and it's very bloated and there's a lot of copy-paste and there's awkward abstractions that are brittle, and like it works but it's just really gross.

我确实觉得当你真正看代码时,有时候会有点心惊肉跳,因为代码不一定总是特别优秀,而且非常臃肿,有很多复制粘贴,有些抽象很脆弱,虽然能用但真的很糟糕。

22:52A

And I do hope that this can improve in future models.

我确实希望未来的模型能在这方面有所改进。

22:55A

A good example also is this, you know, MicroGPT project where I was trying to simplify LLM training to be as simple as possible.

一个很好的例子是这个 MicroGPT 项目,我试图把 LLM 训练简化到尽可能简单。

23:04A

The models hate this. They can't do it.

模型很讨厌这个。它们做不到。

23:06A

I kept trying to prompt an LLM to simplify more, simplify more, and it just can't—you feel like you're outside of the RL circuits.

我一直试图提示 LLM 再简化一点,再简化一点,但它就是做不到——你会感觉你在强化学习回路之外。

23:15A

It feels like you're obviously, you know, pulling teeth. It's not like light speed.

感觉就像在拔牙一样。不像光速那样快。

23:20A

So I think, I do think that people still remain in charge of this.

所以我确实认为,人类仍然要负责这些事情。

23:25A

But I do think that there's

但我确实认为

23:26A

Nothing fundamental again that's preventing it, it's just the labs haven't done it yet almost.

从根本上说,没有什么东西在阻止它实现,只是实验室还没有做到而已。

23:30B

Yeah.

是的。

23:31A

So I'd love to come back to this idea of jagged forms of intelligence. You wrote a little bit about this with a very thought-provoking piece around animals versus ghosts.

那我想回到「参差不齐的智能形态」这个概念。你写过一篇很有启发性的文章,讨论动物与幽灵的对比。

23:39A

And the idea is that we're not building animals, we are summoning ghosts.

核心观点是:我们不是在构建动物,而是在召唤幽灵。

23:46A

And these are jagged forms of intelligence that are shaped by data and reward functions, but not by intrinsic motivation or fun or curiosity or empowerment.

这些智能形态是参差不齐的,它们由数据和奖励函数塑造,但不具备内在动机、乐趣、好奇心或自主性。

23:54A

Things that kind of came about via evolution.

这些特质是通过进化产生的。

24:00A

Why does that framing matter and what does it actually change about how you build and deploy and evaluate or even trust them?

为什么这种框架很重要?它实际上如何改变你构建、部署、评估甚至信任 AI 的方式?

24:08B

Yeah, so I think the reason I wrote about this is because I'm trying to wrap my head around what these things are, right?

我写这篇文章是因为我在努力理解这些东西到底是什么。

24:15B

Because if you have a good model of what they are or are not, then

如果你对它们是什么、不是什么有一个清晰的认知模型,那么

24:18A

You're going to be more competent at using them, and I do think that I'm not sure if it actually has like real power. [laughter]

你使用它们时会更得心应手。不过我不确定这个框架是否真的有实际效力。(笑)

24:28A

I think it's a little bit of philosophizing, but I do think that it's just coming to terms with the fact that these things are not, you know, animal intelligences.

我觉得这有点哲学化,但确实是在接受一个事实:这些东西不是动物智能。

24:38A

Like if you yell at them, they're not going to work better or worse or it doesn't have any impact. And it's all just kind of like these statistical simulation circuits where the substrate is pre-training, so like statistics, and then but then there's RL bolted on top.

比如你对它们大喊大叫,它们不会表现得更好或更差,完全没有影响。它们本质上是统计模拟电路,基础是预训练——也就是统计学,然后在上面加了强化学习。

24:55A

So it kind of like increases the dependencies, and maybe it's just kind of like a mindset of what I'm coming into or what's likely to work or not likely to work or how to modify it.

所以这增加了依赖关系的复杂度。也许这只是一种思维方式,关于我如何看待它、什么可能有效、什么可能无效、以及如何调整它。

25:05A

But I don't actually—I don't know that I have like here are the five obvious outcomes of how to make your

但我其实没有——我没有那种「这里有五个明显的方法来改进你的

25:11A

System better, it's more just being suspicious of it and

系统」的结论,更多是保持怀疑态度,然后

25:14B

Figuring out over time.

随着时间慢慢摸索。

25:16B

That's where it starts. Okay, so you are so deep in working with agents that don't just chat. They have real permissions, they have local context, they actually take action on your behalf. What does the world look like when we all start to live in that world?

这就是起点。好的,你深度参与开发的 agent 不只是聊天,它们有真实的权限、本地上下文,能代表你采取实际行动。当我们都开始生活在那个世界里,会是什么样子?

25:31A

Yeah, I think a lot of people probably here are excited about what this agent native agentic environment looks like and everything has to be rewritten. Everything is still fundamentally written for humans and has to be moved around. I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans. This is my favorite pet peeve. Like, why are people still telling me what to do?

我想这里很多人都对 agent 原生环境感到兴奋,认为一切都需要重写。现在一切仍然是为人类设计的,需要人来操作。我使用各种框架或库时,它们的文档仍然是为人类写的。这是我最喜欢吐槽的点:为什么人们还在告诉我该做什么?

25:57A

I don't want to do anything. What is the thing I should copy paste to my agent?

我什么都不想做。我需要的是可以直接复制粘贴给 agent 的东西。

26:00A

[laughter] Like, so it's just every time I'm told, you know, go to this URL or something like that, it's just like ah [laughter] you know. [snorts]

(笑)每次被告知「去这个网址」之类的,我就会想「啊」(笑)(哼)

26:07A

So everyone is I think excited about how do we decompose the workloads that need to happen into fundamentally sensors over the world, actuators over the world.

所以大家都在思考如何将需要完成的工作分解为对世界的传感器和执行器。

26:16A

How do we make it agent native? Basically describe it to agents first, and then have a lot of automation around, you know, data structures that are very legible to the LLMs.

如何让它 agent 原生?基本上就是先为 agent 描述,然后围绕对大语言模型高度可读的数据结构做大量自动化。

26:30A

So I think, yeah, I'm hoping that there's a lot of agent first infrastructure out there and that, you know, for Menuguen famously when I wrote the—not I'm not sure how famously but when I wrote the blog post about Menuguen [laughter]

所以我希望会有很多 agent 优先的基础设施出现。说到 Menuguen,当我写那篇——不确定有多出名,但当我写关于 Menuguen 的博客文章时(笑)

26:44A

A lot of the work, a lot of the trouble was not even writing the code for Menuguen, it was deploying it in

大部分工作、大部分麻烦甚至不是编写 Menuguen 的代码,而是部署它

26:48A

Vercel, because I had to work with all these different services and I had to string them up and I had to go to their settings and the menus and you know configure my DNS and it was just so annoying. And so that's a good example of I would hope that MenuGen that I could give a prompt to an LLM, build MenuGen, and then I didn't have to touch anything and it's deployed in that same way on the internet.

以 Vercel 为例,我当时需要对接各种不同的服务,把它们串联起来,还得进到各自的设置和菜单里配置 DNS,整个过程非常烦人。所以我希望像 MenuGen 这样的项目,我只需要给 LLM 一个提示词,让它构建出 MenuGen,然后我什么都不用管,它就能以同样的方式部署到互联网上。

27:07A

I think that would be a good kind of a test for whether or not a lot of our infrastructure is becoming more and more agent native.

我觉得这可以作为一个很好的测试标准,来判断我们的基础设施是否正在变得越来越适合 agent 原生使用。

27:14A

And then ultimately I would say yeah, I do think we're going towards a world where there's agent representation for people and for organizations and you know I'll have my agent talk to your agent to figure out some of the details of our meetings or things like that.

最终我认为,我们确实在走向这样一个世界:人和组织都会有 agent 代表,比如我的 agent 会和你的 agent 对话,来敲定我们会议的一些细节之类的事情。

27:30A

So [laughter], I do think that that's roughly where things are going, but yeah, I think everyone here is excited about

所以(笑),我确实认为事情大致在朝这个方向发展,而且我觉得在座的每个人都对此感到兴奋。

27:37A

That.

对此感到兴奋。

27:38B

I really like the visual analogy of sensors and actuators. I actually hadn't thought of that. That's super interesting.

我真的很喜欢传感器和执行器这个视觉类比。我之前还真没这么想过,这个角度超级有意思。

27:43A

Right?

对吧?

27:43B

Okay, I think we have to end on a question about education because you are probably one of the very best in the world at making complex technical concepts simple and deeply thoughtful about how we design education around it.

好,我想我们得用一个关于教育的问题来结束今天的对话,因为你可能是世界上最擅长把复杂技术概念讲简单的人之一,而且你对如何围绕这些概念设计教育有非常深刻的思考。

27:56B

What still remains worth learning deeply when intelligence gets cheap as we move into the next era of AI?

当智能变得廉价,当我们进入 AI 的下一个时代,什么东西仍然值得深入学习?

28:05A

Yeah, there was a tweet that blew my mind recently and I keep thinking about it like every other day. It was something along the lines of, you can outsource your thinking but you can't outsource your understanding.

有条推文最近让我震撼了,我每隔一天就会想起它。大意是:你可以外包你的思考,但你无法外包你的理解。

28:17B

I think that's really nicely put. Yeah, because I'm still part of the system and I still have to

我觉得这个表述非常精准。因为我仍然是系统的一部分,我仍然需要——

28:25A

Somehow information still has to make it into my brain, and I feel like I'm becoming a bottleneck of just even knowing what are we trying to build, why is it worth doing, how do I direct, you know, how do I direct my agents and so on.

信息仍然需要以某种方式进入我的大脑,我感觉自己正在成为一个瓶颈,甚至只是知道我们要构建什么、为什么值得做、我该如何指导——如何指导我的 agent 等等,这些都成了瓶颈。

28:34A

So I do still think that ultimately something has to direct the thinking and the processing and so on, and that's still kind of fundamentally constrained somehow by understanding.

所以我确实认为,最终必须有什么东西来指导思考和处理过程,而这在某种程度上仍然从根本上受到理解能力的制约。

28:46A

And this is one reason I also was very excited about all the LLM knowledge bases, because I feel like that's a way for me to process information, and anytime I see a different projection onto information, I always feel like I gain insight.

这也是我对所有 LLM 知识库感到非常兴奋的一个原因,因为我觉得那是我处理信息的一种方式,而且每当我看到信息的不同投影角度时,我总觉得自己获得了洞察。

28:56A

So it's really just a lot of prompts for me to do synthetic data generation kind of over some fixed data. So I really enjoy whenever I read an article, I have my wiki that's being built up from these articles, and I love asking questions about things, and I think that

所以对我来说,这其实就是在某些固定数据上进行合成数据生成的大量提示词。我真的很享受每次读完一篇文章后,我的 wiki 就从这些文章中构建起来,我喜欢提出各种问题,我认为——

29:12A

Ultimately these are tools to enhance understanding in a certain way, and this is still kind of like a bit of a bottleneck because then you can't direct the—you can't be a good director if you still—because the LLMs certainly don't excel at understanding, you still are uniquely in charge of that.

最终这些都是在某种程度上增强理解的工具,而这仍然是一个瓶颈,因为如果你不能很好地理解,你就无法成为一个好的指导者——因为 LLM 在理解方面显然并不擅长,你仍然是唯一负责这件事的人。

29:28A

So yeah, I think tools to that effect are incredibly interesting and exciting.

所以我认为朝这个方向发展的工具非常有趣和令人兴奋。

29:33B

I'm excited to be back here in a couple years and to see if we've been fully automated out of the loop and they actually take care of understanding as well.

我很期待几年后再回到这里,看看我们是否已经被完全自动化排除在外,它们是否真的也能处理理解这件事了。

29:40B

Thank you so much for joining us, Andrej. We really appreciate it.

非常感谢你加入我们,Andrej。我们真的很感激。

29:42A

[applause]

(掌声)