OpenAI Codex Team: From Coding Autocomplete to Asynchronous Autonomous Agents
Hanson Wang and Alexander Embiricos from OpenAI's Codex team discuss their latest AI coding agent that works independently in its own environment for up to 30 minutes, generating full pull requests from simple task descriptions. They explain how they trained the model beyond competitive programming to match real-world software engineering needs, the shift from pairing with AI to delegating to autonomous agents, and their vision for a future where the majority of code is written by agents working on their own computers. The conversation covers the technical challenges of long-running inference, the importance of creating realistic training environments, and how developers are already using Codex to fix bugs and implement features at OpenAI. Hosted by Sonya Huang and Lauren Reeder, Sequoia Capital Mentioned in this episode : The Culture : Sci-Fi series by Iain Banks portraying an optimistic view of AI The Bitter Lesson : Influential paper by Rich Sutton on the importance of scale as a strategic unlock for AI.
- Published
- Published Jun 10, 2025
- Uploaded
- Uploaded Jun 11, 2026
- File type
- POD
- Queried
- 00
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] - In my opinion, the easier it is to write software, [00:02] And then the more software we can have right now, if we think of like, I bet you if we look, pull up our phones, well, you folks are investors, but if you're not an investor, I bet you if you pull up your phone. [00:10] Most of the apps on it are apps that are built by large teams for millions of users. And there's very few apps that are built just for us and the specific thing that we need. And so I think as it becomes... [00:21] more and more practical to build bespoke software for people or teams, we'll end up having higher and higher demand of software. [00:27] Bye. [00:44] Welcome to Training Data. [00:45] Today, we're joined by Hansen Wang and Alexander Embaricos from OpenAI's Codex team for a fascinating look at the future of software development. [00:52] Codex is OpenAI's series of AI coding tools that helps developers delegate tasks to cloud and local coding agents. Unlike the original OpenAI Codex, which was developed in 2021 to autocomplete lines of code, the latest evolution of Codex can complete entire tasks for you autonomously in the background. The key difference between O3 and Codex is that while O3 is great at competitive programming, Codex has been RL-tuned to be great at day-to-day enterprise development tasks. [01:20] Alexandra and Hanson share more about the backstory for Codex and the broader paradigm shift from snappy autocomplete to longer-running background agents. [01:29] Plus, they share their surprising vision for how developers will interact with AI in the future as sync and async experiences merge.
[01:37] Hint, it might look more like TikTok than your current IDE. [01:40] Thank you guys for joining us. It's wonderful to have you here. Hey, thanks for having us. Great to be here. We'd love to hear a little bit more about what you guys work on. Tell us about the Codex team and your story. Cool, yeah. I'm Hanson. I'm one of the researchers that helped train the Codex 1 model. And I'm Alex, the product lead. I think for me, the name Codex is such a great callback to the original Codex model. That was kind of like an aha moment for me when it first came out. Because I think GPT-3 was really cool. [02:10] do something that is going to change the world. And that's actually kind of like how I got into the whole startup space. One of the first couple of demos I did was using Codex to do data analysis. I think it's actually a funny story. I was here for as part of Sequoia's ARC program. That's how I met Lauren. And then when the demos we did, we actually used OpenAI Codex to do data analysis. And that's how I started in the startup space. [02:40] later versions of GPT came out, became super clear that using AI for agentic use cases was going to be the future. And so I joined the company to work on agentic coding efforts. Yeah, and this was like, you know, a per standard OpenAI style where we like the naming to be as easy to follow as possible. This is the codex of like, I think it was 2021. Yeah, this is pre-chat GPT, right? Exactly, yeah. So it was actually like the model powering GitHub Copilot. [03:06] And then recently, as we were working on this product, which we'll talk about...
[03:10] We thought, you know, this is like a super fun brand, also a very apt name, you know, Code, Codex, Codex Cution. So like we decided to sort of like resuscitate the brand and like keep using it. You said resuscitate. So was Codex dormant for a while and then you all resuscitated it for the agent now? We haven't used the brand like recently. Okay. Okay. Really cool. Can you tell us a little bit about Codex, the agent, and what it does? Yeah, I think so. Basically, Codex is a coding agent that has its own container and its own terminal kind of like fully in the cloud. You give it a task and it comes back to you with a PR. [03:40] in this sort of like one shot style. And we actually experimented with a lot of different [03:43] form factors kind of along the way but kind of in the end decided to to settle on this one yeah so like you know we've been working on a bunch of agents and we've been working on a bunch of coding products as well and [03:54] Basically, in our mind, Codex is like this thought experiment for [03:57] how would it work to code with AI, but where [04:00] we sort of put all our effort into thinking about what would that would feel like if the AI is working on its own computer. [04:06] you know, independently from you. And so you're delegating to it rather than like pairing with it. And so, you know, some of the things that we're really proud of with this Codex launch are thinking about like the compute environment and like how do we set it up so that the agent can actually work on its own but be productive and like creating the model, which Hanson talked more about, like basically that isn't just good at like [04:25] writing code that looks good or is functional but also is really good at writing code that like is useful for professional software engineers and like mergeable ideally without even touching your their own computer [04:33] So what is the difference between Codex and Codex CLI? Yeah, we've definitely gotten some questions about that. I promise this is all going to make even more sense over time. So basically, Codex for us is like our brand for like agentic coding.
[04:46] And we have this vision of like, you know, like, [04:49] we're going to have this agent and mostly the agent will work on its own computer, but it shall be able to meet you in like any of the tools that you use, wherever you work, be that your terminal or, you know, your IDE or your issue management tool. So Codex CLI is basically like codex, like in your terminal. So like CLI stands for command line interface, right? So it's like [05:06] in your terminal you can work with codecs that's like your environment and then [05:10] Codex or Codex in ChatGPT is basically a codex working on its own computer. [05:14] Today, those are just distinct things. As a brief aside, one of my favorite things about working at OpenAI is how willing we are to cut scope and just launch things quickly. [05:22] But over time, we'll actually bring those things closer together. So you can really think of it as just like, it's just like Codex and [05:28] you know, it can be in JATPT or it can be in your CLI. [05:31] Very cool. And so what did you have to do differently for the model to make it useful beyond just writing the next line of code? Yeah, so I think one of the most interesting progressions, so if you go back to, you know, like the 01, the first reasoning model that we launched, we highlighted like how good it is at math and even like coding competitions. Like as of now, I used to be a competitive coder and like it's better than me at competitive coding. It's better than most, almost all people at OpenAI at that. [05:56] But I think one of the things that we saw was that, you know, despite being good at these [06:01] programming competitions, it wasn't actually that good at producing mergeable code. And so like we even highlighted this in the blog post with models like 03, like the code that it generates often, you know, like, [06:14] isn't quite to the taste or style that a professional software engineer would expect. So a lot of the effort that we spent on training this model was aligning the model to basically the taste or the preferences of professional software engineers. And that's something that took a lot of, I guess, specialized training. Yeah, I have this very producty analogy that I like, which is like,
[06:35] If you take our like reasoned models, which are great at coding, [06:38] Um, they're great at coding, but it's kind of like this, like really precocious, like competitive programmer, like college grad who doesn't have many years of job experience being a professional software engineer at like on a team. [06:48] Right. And so a lot of the work we did to go from like 03 to like Codex 1 was actually like the equivalent of like those first few years of job experience where it's like, hey, like what does a good PR description look like? You know, PR titles, like how do you read the style of the code base and then make sure your code is in the same style? Yeah. [07:02] How do you test well? [07:03] How do you show that you tested well? Stuff like that. [07:06] Hmm. What's typically the aha moment for when somebody uses Codex? Yeah, I think one of the things we have in the onboarding is like find and fix a bug in the codebase. I think that's one of the areas where Codex really shines is like [07:18] specifically bug fixing, just because it can actually independently [07:23] try not just to see if you know something looks a bit off but it can actually go and then like verify that okay like i can try and reproduce a particular issue and so i think like even you know like leading up to the codex launch there were a couple of bugs where you know like we were sitting there kind of like wondering what's going on and honestly like sometimes the easiest thing to do is just like [07:44] Thank you. [07:45] paste in a description of the issue into codex. And we were surprised how frequently that would actually end up with a usable fix. Yeah, fun story here. [07:53] Hopefully this doesn't give away too much, but at 1 a.m. the night before launch or the morning of launch. [07:58] um at 1am we were we were looking at a bug with like an animation a lot of animation and you know this is the kind of thing like okay i guess we could cut it from launch scope it'd be okay to launch without it
[08:06] But we really wanted to get in, we just couldn't figure this out. And so an engineer ended up describing what the bug was and putting it into Codex. And actually a fun pro tip for anyone who's using Codex is that if there's a really hard task, it can be useful to ask Codex to take multiple cracks at it. So they pasted that description in and ran it four times. [08:25] Like, hey, there's this bug, we can't figure out what's going on. And three of those rollouts did not work. And then one of the four was just like the fix of the bug that we were stuck on for like hours at 1 a.m. before launch. And so landed the fix, you know, deployed the code and the animation was in for launch. That's awesome. Maybe tell us more about how you all are using it internally at OpenAI. Like is every engineer, is every researcher using Codex now in their workflows? [08:47] yeah and actually can i give you the other like kind of magic moment oh yeah please do definitely [08:52] One of the interesting things about Codex is that [08:55] It's a very different form factor from maybe what people are used to, right? Like a lot of the AI products that people are used to, especially in software, maybe like GitHub Copilot was like the first really good one. [09:04] there are really things that kind of like work with you in flow and you're just kind of seamlessly going back and forth. You're kind of pairing right and it's flavors on pairing. [09:10] Um, [09:11] And we think that's awesome. And like the Codex CLI is a tool that you can use in that way. But for the for Codex, you know, we really wanted to [09:18] uh, [09:19] push this idea of like you're delegating because like in the future we imagine that actually [09:24] um you know the vast majority of coding is actually going to be done independently [09:27] uh, [09:28] from like the human like working on their computer who can only do one thing at a time. And so, you know, it'll be done. But basically it'll be done by agents working on their own computer. [09:36] Um,
[09:37] And so. [09:38] that is a very different thing to delegate to an agent than it is to pair with sort of an AI model that's like in your tooling. And so you have to kind of use it differently. And so when we actually were working on an alpha before launch, we would just give this agent to people and be like, hey, like, just use this however you want. [09:54] And [09:55] we noticed that many many of the people trying to use our alpha of codex were just like not really finding it super useful [10:01] And then we're like, oh, that's interesting. Let's look at how people like [10:05] at OpenAI are using like internal tooling like Codex. And we realized there was like [10:09] A big difference, which is the mindset of using it. The mindset that works really well for Codex is like kind of this like abundance mindset and like, hey, let's try anything. Let's try anything even multiple times and see what works. It saves me time. And so we've kind of shifted the way that we even onboard people into the product to try to create this aha moment, which is running many tasks in parallel. [10:30] So like for us, if we see someone like trying it out and like they've run like 20 tasks in like a day or an hour, that's amazing. And like we they're probably going to like they've understood basically how to use the tool. Fascinating. How does that change the role of the human when you have to review all of this code? Like if two of the three work, then what do you do? Yeah, I think we put a lot of focus on also making the outputs easy for people to review. So like one of the. [10:53] things that we're proud of is like, we haven't seen this in too many other tools, is like the ability for the model to cite its own work. So not just like the files that have changed, but also even like the terminal outputs. So like if it ran a test and, you know, like for some reason test wouldn't work, it actually like tells you that and it tells you like, here's the exact kind of like terminal command I ran, here's the output. It makes it much easier to verify the outputs. But it is like a great point. I think we're shifting to a world where like a lot of the time that
[11:21] we spend... [11:23] you know, like normally coding, a lot of that's going to shift to actually reviewing this, reviewing the code. [11:28] Do you need humans to review the code? Because I think of code as one of those things where, you know, it compiles or it doesn't. And once it compiles, you can go and check if it does the thing it was supposed to do. Like, do you even need humans to do the code review? [11:38] I think, yeah, I mean, for the foreseeable future, at least, I do see that to be the case. I mean, I think a lot of it's also just like building trust [11:46] with the early users. I think people really need to have a feeling for like, you know, what things are working well, what things are not. And I think there's always just like some external context about like, you know, what makes this code correct that, you know, might be [12:02] beyond what you initially provided as context. [12:05] Yeah, like if you think of what, you know, what a developer does, and this is obviously oversimplifying, but there's like, [12:10] Okay, there's coming up with what things maybe should be done, discussing them with the team, maybe deciding what to do. You call that ideation. You know, maybe then there's design, like, okay, what are we actually doing? And then like planning, how are we going to do it? Then there's implementing. [12:23] and then validating, you know, testing those changes. And, you know, that's basically a loop. [12:27] And that small loop of like implementing and then testing is what Codex is great at right now. [12:31] Although we can talk about how you can use it for planning too. And then there's actually deploying the code and then maybe maintaining the code, writing, documentation, et cetera. And so like, you know, I forget the exact stack, but I feel like a stat I remember recently is like engineers spend like maybe like 35% of their time coding. [12:45] It's not actually the majority of even what engineers do. [12:49] The future that we're trying to build towards is one where if you're a software developer or even in any profession,
[12:56] all the work that is like easily automatable, that's usually the grungier type of work, [13:00] You're not doing, you're delegating that. And then the work that is more interesting because maybe it's ambiguous or maybe because it's really hard, that's the work that you're driving. [13:08] So we're trying to build towards that work, that world. [13:12] And I think we have to get there iteratively. So for example, right now, if you're a human and you write code, another human is going to review that code. Right. And so we're not going to come in and just like try to change that. And we're like, okay, let's plug into that. So, you know, the way the product works right now is like. [13:26] you the developer are being accelerated by the tool you ask for some code to be written [13:30] you decide if it's good and you want to push it out to your team and then your team can review it. And then over time, we'll basically kind of expand what we can do. So we'll help more and more with like, [13:39] planning, maybe even designing, maybe even thinking about what to do in response to things that are happening in your app or at work. [13:45] and then we'll push to make review easier and easier, as Hanson was describing. Yeah, and I do think I see a future where you have multiple agents collaborating together. So you have the codex agent writes the code, and then maybe the operator agent's the one that's testing it, and all of the things that-- [14:01] All the different agents that we've been working on at the company can kind of like come together. That's awesome. Have you seen people, now that you can delegate writing code, people beyond engineering teams start to use codecs? And as we get into the world of vibe coding, you guys are helping us bring us further down that hole? Yeah, this is actually super funny. So the answer is yes, but I'll tell you a story. [14:20] we were working on our launch blog post with Lindsay here and
[14:27] We were talking about what quotes to quote from customers. And we had a customer that wanted to say, yeah, we on the engineering team love this. And also, it's like a power tool for PMs. And I remember looking at that quote and being like, this is a really cool quote. Because I'm on the product team. And I use it to just avoid having to bug an engineer about things or to answer questions. But I remember looking at that quote and being like, do we want that in the launch blog post? Because the target audience for what we're building is like... [14:51] specifically professional software engineers, not VibeCoders. So I think we ended up not including that exact line. But I think over time, like, as... [15:01] As we have agents that can help us code, I would expect more and more people to be able to contribute to code bases. Do you think the number of professional software developers goes up or down over time? [15:10] This is just my opinion, but I think it goes way up. Huh? I think I'm not vibes coders, professional software developers. Yeah, yeah, I think so. [15:18] But yeah, in my opinion, the easier it is to write software, [15:22] And then the more software we can have right now, if we think of like, I bet you, if we look, pull up our phones, well, you folks are investors, but if you're not an investor, I bet you, if you pull up your phone. [15:29] Most of the apps on it are apps that are built by large teams for millions of users. And there's very few apps that are built just for us and the specific thing that we need. [15:38] And so I think as it becomes [15:40] more and more practical to build like bespoke software for people or teams will end up having higher and higher demand software. Yeah, as I think about how I use it, I think it just really is a multiplicative factor right now rather than any kind of any sort of replacement, just like, especially looking at the patterns of our internal power users is like a really dramatic, like difference in
[16:00] like the top users of codecs are like doing, you know, [16:04] 10 plus PRs every day um it's just like really such a multiplicative factor that I I can't see like a world in which like [16:12] It's like lowering the bar to creating software so much. That said, I think this is a really important question. And to be completely honest, we don't know. And so this is something that we as a company pay a lot of attention to. I want to talk a little bit about what's happening under the hood on the technology side. So you mentioned that the model itself, one of the things that makes it different from competitive programming is you've made it more... [16:32] be good at the things that a professional software developer would do. Is that the biggest difference on the model side? Or should we think of it as a close cousin of O3? Yeah. So it's definitely the same model as O3 with additional reinforcement fine-tuning. But that said, yeah, I think so part of it is kind of like these more like qualitative aspects of what makes a good software engineer versus simply like a good, let's say like coder, you know, like style, even like how it writes comments. [17:02] that people have noticed with other models. And then on top of that, I also want to highlight one of the big challenges was making good environments for the agent to learn in. And so if you think about real-world software repositories, it's so varied and complicated. Think about how much DevOps has to go into setting up a repository, and that's something we're learning the hard way with our environment setups. Should we talk about the multi-repo I was showing you yesterday?
[17:32] Oh, yeah. Like, yeah, I was showing Hanson the repo for the startup that, you know, OpenAI acquired. And so we joined. And so we were looking at that repo together, thinking about it for you as an environment. And Hanson's like, so... [17:44] like [17:45] where are the unit tests you know because the agent uses unit tests to verify and i was like uh [17:52] this is a real startup that has no unit. I mean, honestly, same. So I can't complain. So yeah, you have all these really messy environments. So we had to, over the course of training, we had to basically generate these really realistic environments for the agent to learn from. And I think one of the reasons that we're able to make such an end-to-end product work is that we have the same environments that we use during training and the same-- [18:22] basically this containerization infrastructure that we're using to serve in production. So our users are, you know, like we're running our own computer environments. When users use codecs, they're running in the exact same environments that we're using for training. [18:36] So you don't have the agent saying, but it works on my machine? Exactly. OK, OK, OK. I think these are also the longest running agents I've seen out of OpenAI. Deep Research maybe was the previous one that was longest running. And my understanding is codecs can sometimes spend 30 minutes on different tasks. Are there any kind of surprising challenges and things you've encountered just getting inference time to scale up on a query for so long? Maybe I'll start with the product.
[19:03] and then there's many on the modeling side, but on the product side, actually, the thing that I think the most about is user intent. [19:10] It's like, actually... [19:12] You know, if you imagine [19:13] someone using [19:15] like autocomplete in their IDE, it's like, [19:18] not super hard necessarily. I mean, obviously it's difficult, but it's not super hard to predict. What are they trying to do right now for the next microsecond? [19:26] But for doing a task that takes 30 minutes, [19:29] It's actually fairly difficult to [19:31] help a user describe the task like they may not even know exactly what they want [19:35] for 30 minutes worth of work. And so something that we spent [19:39] a while debating and it's like still a thing we debate is like what is the right granularity of a task? [19:44] for someone to give to Codex and like how can we make it easy so that Codex can like be really flexible where you can use it for like one line changes. [19:51] You can use it for like big refactors that you know exactly what you want or like larger features where you know what you want. [19:57] Or maybe can you use codecs when you don't know exactly what you want and so maybe you should [20:01] ask codex for a plan and then you can like have it codex suggest tasks and then like do those tasks afterwards [20:08] So that's still a topic of debate and iteration for us. Yeah, I think that's actually a good pro tip for using it. It's actually really good at coming up with its own plans. And then sometimes it's really tedious to specify everything you want upfront. And that's one of the unique challenges about working. If you wanted to work for an hour at a time, then you do have to specify a lot upfront, which means that you have to spend, I don't know, 10, 20 minutes coming up with that. But if you use actually the Ask mode to first
[20:37] you know [20:37] generate like a high level plan of what you want to do and then you can like iterate on that with the model before you you know send it off for for an hour it really is like working with an intern yeah what about on the model side anything that's surprising in terms of model behavior as it starts to run for so long [20:52] Yeah, I think our models have gotten a lot better at kind of like sticking [20:56] kind of like on task, especially with these longer rollouts. [21:01] I will say like there are cases where, you know, like even there is a limit to the model's patience, even though it's quite high. So it can have you frustrating sometimes, you know, it's like it goes off for like 30 minutes. And then, you know, this is a case that we're working to get better at where it's like, you know, it's kind of like just like a human. It comes back to you. It's like, sorry, I don't. This is too much. I don't have enough time to do this, actually. Like that's one of the things it says. It's hilarious. Just like an answer. So very, very human like. Yeah. [21:31] right interaction patterns and how they evolve and how the suite of products around this evolve over time. We have codex, we have codex CLI [21:37] What else do you think is out there in the design space? [21:40] for engineering and building products. [21:42] Yeah, so... [21:43] The codex as we launched it is really just like, you know, it's a research preview. It's a thought experiment, a useful one, but it's still very early. And... [21:52] what we're most proud of with Codex is [21:55] Uh, the model. [21:56] and the beginning of this foundation for computer environments. And the UI we shipped is one that we iterated towards, and there's some fun stories there. [22:04] Um, [22:05] But it's definitely not the final form factor. And for those listening, basically the UI we shipped is an interface in ChatGPT where you can submit a task and ask Codex to either answer your question or write code. And then you have this something that looks a little bit like a to-do list of things that you can go look at merging.
[22:24] Really, I think for... [22:26] So we built that to really lean hard into this idea of like an asynchronous agent that you delegate to. [22:31] But what we want to build towards is [22:35] a setup where you don't have to think about whether you're delegating or whether you're pairing with an agent. And it's really, it should just feel like working with a teammate [22:42] and where that teammate is like ubiquitously present in all the tools you work with. [22:46] So, [22:47] you should be able to [22:49] pull up any tool that you're working in, be it your terminal, your IDE, your issue management tool, maybe your alerting tool, your errors, you know, the tool shows you errors. [22:58] and just ask for help. Maybe even Codex has already taken a look before you even got there, and it has an opinion there. And you could be able to ask something, be it a short question or a long question. It'll just appropriately decide how much time to spend before answering you. [23:11] and [23:13] just like help you land those changes. [23:15] So [23:16] Basically, we want to kind of blend this idea of like pairing and delegation. But the first thing we shipped was just like, [23:22] the purest thought experiment. [23:24] The other thing I'll add to this is like one of the unique [23:28] things about working at OpenEI is that we are the makers of ChatGPT. [23:31] which is sort of the AI system that most people use. And so, [23:36] We don't actually see a future where as you go about your day, you're deciding whether to use like the Codex agent or I don't know, you're like shopping agent or like. [23:46] taxi ordering agent, by the way, I'm just like naming random things here, or you're like marketing agent. [23:51] Um, [23:52] Actually, the way we think this should work is you should just have like one assistant that you talk to and you can ask it anything about anything and it can just like do the things you need.
[24:01] And so that's ChatGPT that will become our assistant. And then if you're a power user of a certain type of tool, so let's say you're a software developer, you spend a lot of time in certain functional tools, then you can go into that tool and have like a bespoke interface with buttons. [24:15] with lists that you can use to like efficiently go about your day. Do you think we'll still use IDEs? [24:20] Yeah, for sure. But they'll evolve. [24:22] Right, like right now they're like very focused on [24:24] writing code and like as Hanson was saying, like probably agents will be writing more and more code. And so it's going to become like there'll be a shift in emphasis towards like [24:33] landing code or reviewing code or like validating them or maybe even like a shift in emphasis towards planning like bigger arcs yeah i think we're already seeing a lot of people on the team they kind of like first thing in the morning they come in like they make coffee and then they they like kick off a few tasks just to kind of get a starting point and then you know they come back after their um breakfast and they they look at the tasks that or the prs that got generated then they'll take those and the ide is kind of like the place where you take you know it's it's not it's [24:59] Maybe we'll get you like 80% of the way there, hopefully, or even more. But then there's always this like last mile where you go in and really like fine tune, uh, [25:09] based on kind of like your own vibes. How do you see the broader market evolving? Like within OpenAI, you have so many different strategies here. And as you think about async tasks, as you think about some of the things that you mentioned, moving into ChatGPT, we're seeing an explosion of other tools and specialized models. [25:26] You obviously are biased, but I'm curious what your read is of the broader market. Yeah, it's a crazy time to be a developer right now.
[25:32] Like there are just so many new tools. [25:35] Um, [25:36] that [25:37] are just so helpful. Like a fun story recently is I was in the airplane and there was no Wi-Fi and I had thought that I was going to maybe write some code and like build a thing and there was no Wi-Fi and I was like, you know what, screw it. Like it's just not worth my time to like even try to write code anymore. Whereas, you know, the startup that I was working on like many years ago, like part of the genesis of that startup was like me writing some code without Wi-Fi in an airplane. [25:57] And I just wouldn't even do that anymore because like the market is just like it's just changed so much. And I think this I think we're going to see like an equivalent shift in an equivalent amount of time. So like in the next two years. [26:07] Coding will look completely different. [26:09] I think [26:10] Right now, most of the tools that people spend [26:13] you know, that people find the most value from are tools that work really closely with you, like in your development environment, [26:19] you know like basically pairing and i think the shift that we're going to see [26:23] but we have to figure out how this will happen. But the shift that we're gonna see is that actually, the majority of code will be written by agents, [26:29] And those agents won't be working in your environment where you can do one thing at a time, but they'll be working in their own environments. [26:35] And they won't just be triggered by you like thinking of specific tasks, but they'll be connected into the tools you use doing work there. [26:41] And so, [26:42] I think we'll see basically that shift towards agents. I think we're gonna have to figure out a lot about code review, as you were asking about. [26:48] personally like i don't exactly know how that's going to work but i do know that even already at [26:53] much more code is merged. [26:55] by agents, but actually also [26:57] even more code is generated by agents as folks are like, you know, like say kicking off tasks four times to like choose their favorite implementation. And so it's like not 100% clear.
[27:07] how we should... [27:08] how we should even like manage all this code that is that is being written [27:13] Some things that I will say, though, in case it's useful to the audience, is that there are definitely things you can do to your code base to make it more addressable. [27:19] for agents? [27:20] Um, [27:21] This isn't necessarily particularly novel, but, you know, obviously using like typed languages. [27:25] is really helpful. [27:26] Another thing that's very helpful is like having like smaller, smaller modules that are like better tested. Like we joke about having good tests at all. Yeah. Like we joke about my startups repo, but like, I bet you we would have written it differently if we were writing it today. And even there's like small things like, uh, [27:42] The code name for this project is Wham. [27:45] This is the code name for Codex. It's like WHAM. And when we named it, we were very intentional in doing so because we knew we would have code like in the server, like for the website, in various other places. And we wanted it to be really easy for the agent to like search for WAM related code and find it. And so we named the project. [28:04] Um, [28:06] you know, wham, and we grep the code base first to figure out how often it was there. Like if we would have called it something like, [28:10] code or codex or agent you can imagine like it would have been really hard for the agent to now you called it codex and now the agent's gonna be confused well so in the code this is kind of my point right like intentional design like in the code we use the term wham like a lot [28:25] because that's actually much easier for the agent to find. Obviously, if we didn't use a word like that, the agent could still find its way, but it would have to spend much more time to find the right files. [28:34] It is cool that a lot of the things that actually make the code base easier for humans, too, also tends to make it easier for the agents. Like good tests, for example. Writing good docs is another great example. Where now I think there's even more of an incentive to do that. Because not only does it make your life easier, it makes the...
[28:50] agents life easier. Okay, sorry to be the annoying VC, but Cloud Code and Jules are also like, I think, agentic coding experiences from others. I'm curious how you think your experiences compare today. And then do you think the market is probably going to converge towards the same vision of what sync and async coding look like? And in that version of the future, what do you think OpenAI wins on? [29:11] I think we're going to see a little bit of everything, right? Like even in what you mentioned, like there's like tools that are working on your computer, right? [29:16] There's tools that are working on their own computer. As I mentioned, I think we're going to see [29:21] the majority of work being written where like the agent has its own computer, but it will still be really important for us to invest in accelerating developers who are doing work on their own computer too. [29:30] So ideally we get the best of both worlds there. [29:32] But most work is done in agent compute. [29:35] I think the way I see it as well is like, I think one of the hardest part of software engineering really is like, [29:41] taking all the context from the world and like encoding it in these requirements, these like design docs. And then the implementation, like I think as we alluded to earlier, is like [29:48] not actually like that much of the life cycle is spent on physical coding. And so I think where ChatGPT shines is like, it is this assistant that has, you know, like has memories now. It has access to like a lot of different connectors to like all the different tools you use. We have like operator deep research that have all these like different capabilities. And so I think the vision where that like all comes together is where, you know, like a tool like Codex can really shine once it has access to all that knowledge, it's able to like make use of that.
[30:18] it should be able to do a much more effective job at just the coding part. Yeah, imagine hiring a software engineer, and the only thing that that software engineer can do is take a task from you and produce a PR. [30:31] Right. Or, you know, it has like these like very well defined features and it can exactly do those things. [30:36] And then you ask for a random thing like, oh, hey, the team is getting together. Do you mind also getting a meeting room and leading a brainstorming? [30:44] Like it would just be so frustrating if like you hired a teammate and they refuse to do that kind of work. Right. And so similarly, I think like it's really. [30:51] like we're building towards a future where like agents that you're working with are a little bit more generalized like you know to reference like hansen was talking about like you know operator and deep research like if you think [31:01] Operator has a web browser. DeepResearch has like a different flavor of a web browser. Codex has a terminal like really like [31:07] your teammate has pretty similar tools, like a human teammate, right? And so, like, the goal for us eventually is to, like, [31:14] pick places where we want to really invest in a specific audience to like make rapid progress. So we obviously we're doing that with coding with codex or like GPT 4.1. [31:22] where we generated specific evals for that audience and then made a better model for them. [31:26] for developers. [31:28] But then over time, like generalize these things into like simple things that everyone can use. [31:33] So I think like, again, with us, you know, with, with open AI and like chat PT, I feel like that's a place where the products we build will look very different from something that's like very only specifically for coding. [31:43] What do you think will be the primary UI that developers use to interact? [31:47] with Codex? Do you think it'll be chat GPT, the CLI, the IDE, all of the above?
[31:52] Yeah, it does. I think a mix of all of the above, I think we just kind of want to meet developers where they are in that moment. So it might not even be in the editor or in the terminal. It might be on Slack. Someone messages you like, hey, there's a bug, and you're just like, hey, go fix it. Yeah. [32:09] I'll give you my fun future UI that is not at all serious. [32:15] the future of like working with agents if you're like, you know, a startup founder in the future and you have like a team of just you or you and a couple of founders and many agents actually looks like TikTok. [32:27] You know, maybe you have like vertical feed and it's basically an agent has produced video that you can watch with like an idea like, hey, a customer wrote in with this request. I think we should fix it. And then you swipe right to say like, yeah, let's let's fix this. Let's do this. Swipe left to say no. Tinder or TikTok? Sorry, it's a hybrid. It's a hybrid. I didn't say this was going to make a lot of sense. I like it. And then you press and hold. [32:51] to provide feedback. So you'd be like, yes, do it, but make sure the font is in italic. [32:56] And so basically you have all these agents who are like subscribed to information at your company or on your team. [33:01] And they're proactively coming up with ideas and, you know, doing them and then giving you updates. And you're kind of just like curating the work that is being done. And they show you little previews of what the world could look like. Yeah. Obviously, that's a half joke, though. You know, I think that'll be like kind of the arm's length working with agents and then [33:17] there's like, [33:18] is definitely going to be really important for people to be able to like go do the work themselves and like pair with agents in.
[33:24] I get that it's a half joke, but it is. It's a really cool visual because I think everyone agrees conceptually with this idea of collaborating and reviewing all the different changes an agent makes. It's going to look very different from how we code today, but nobody's actually given me a visual of what that might look like, so that's a really cool idea. I love it. Awesome. Should we wrap up the lightning round? Let's do it. [33:42] Okay, recommended piece of content or reading for AI fans. [33:46] For me, that's like immediate. It's like The Culture by Ian Banks. Have you read it? Yes, it's amazing. Yeah. [33:52] It is a science fiction series that... [33:55] started being written in the 80s, [33:58] And it is. [33:59] unusually positive. [34:02] in its view of like how a future spacefaring like human and non-human like race could kind of look. And there's a lot of questioning about like what is the purpose and meaning of life when we have AGI. [34:14] Yeah, I think for me, it's like anything by Richard Sutton. I think that was like my introduction to reinforcement learning. And I think it's like, it's kind of a joke here that like we read the bitter lesson like every single day. That's like kind of the philosophy of open AI. Like I think, you know, even with Codex, like we give it a terminal and like it literally uses POSIX tools. That's like the most like bitter lesson way of working with the computer. And your favorite AI apps? Gotta be ChatGPT. Not ChatGPT, come on. We're super boring. We're so boring. I mean. [34:44] Okay, either it could be like a new feature that you guys have released other than Kodak's. [34:48] or something outside of OpenAI. [34:49] Okay, so I guess... [34:51] I don't, it's funny, I don't really think of AI apps.
[34:54] Um, right. I, but I do like it when my life gets easier. [34:59] So. [34:59] You know, some things that I like are like when you're using AI, but it's kind of invisible. So like just I'm in product. So I often like file bugs and like linear has a really elegant integration where when you file a bug, [35:12] from a Slack conversation. It just generates the bug from the Slack conversation, but they never say AI anywhere. Just like you actually kind of don't even notice that it's using AI. [35:21] Oh wait, I came up with an answer for favorite AI app, Waymo. [35:24] Ah, there we go. Yeah, I think for me, Copilot has definitely been the-- [35:31] the thing that, you know, [35:33] keeps delivering value every single day for me. Okay, robotics, bullish, bearish? [35:38] bullish? [35:40] Yeah. Which new application or application category do you think will break out in 2025? Other than coding. [35:47] Yeah, I mean, I think... [35:48] I think when you had Yusa and Josh on, it's kind of the same answer, but 2025 is definitely the year of agents. I think we're going to see agents take off in a lot of different categories. Yeah, I have to agree with that. What type of agents are you most excited about? [36:02] Aside from coding agents. [36:04] That's a good question. [36:05] Well, I mean, so my take would be like, [36:08] you know, if we, [36:09] I know this is meant to be rapid fire, right? But like kind of the way we think of agents is you have reasoning models, [36:13] And then you give those reasoning models like access to tools of the trade [36:16] And then you figure out how to train that agent to like, [36:19] do the sort of specific function, right? So it's like not just about writing, it's about journalism, or it's not just about coding, it's about software engineering.
[36:26] So that's kind of what we're doing. And in my mind, [36:29] The reason I'm so excited about agents this year is because we now have a few agents shipped from OpenAI and other companies are shipping agents too. And so we're starting to see what kind of the shape of this is and starting to identify the primitives. [36:41] And so specifically what I've been excited about is like as we bring this together and you come up with like an agent that. [36:47] You don't have to provision separately for every single function, but it's an agent with a computer that has a browser and has a terminal, and it can do multiple things without you having to exactly specify. [36:56] Like you are my coding agent or something. [36:58] Really cool. Thank you so much for joining us. Congratulations on what you've built at Codex. And thank you for giving us a preview of how you think the coding market will evolve and also giving us a peek into how long running async agentic experiences will play out. [37:10] Really appreciate it. Thank you. Thanks for having us. Thank you. [37:37] Thank you.
Want to learn more?
Ask about this episode