I’ve been building a 3D game from “scratch” (I’m using a bunch of libs to support mesh optimization, model loading, debug UIs, platform window management, etc) for about 2 weeks now, on and off in the small hours before work and the rare free evenings where I’m not all coded out. Well, actually, Claude Code and OpenAI Codex have been building the game. I’ve been supervising and directing.
It’s… not a game so far. I know the most about 3D rendering, so I started at the 3D renderer. This is where we are, with the Khronos glTF sample model LightsPunctualLamp:
And this is what it looks like in a competent renderer:
My renderer is a deferred multi-pass PBR (physically-based rendering) renderer currently backed by Vulkan, but architected to be backend agnostic - I have plans to also implement an OpenGL backend. It’s clearly not feature complete nor bug free, but we’re making steady forward progress.
I guess the first question to answer would be “but why?”. I’ve been writing about start-ups and AI and agents and now I’m working on a video game?
The first answer is because I want to! After 20+ years of business programming, these AI coding agents have awakened a dumb child-like excitement about all the different things I can make. Up until Claude Code, I was kind of in a box - I knew what I knew well, I knew what I didn’t know, and I knew enough to know how infeasible some of my ideas were due to that knowledge gap, and realistically gauging my ability to commit to learning. Writing a 3D game in any engine, let alone from scratch, in C++ was so far out of my current skill set it wasn’t even worth considering.
AI agents change that - they’re the ultimate code rosetta stone. Once you understand the concepts of programming, they allow you to write code in literally any language that’s been documented on the internet. They may not do it well, depending on the quality of the code in their training sets, but they can do it, and you can guide them.
The second answer is that a 3D game in C++ is hard. It’s hard for a team of people. C++ is a wordy language and games are a unique breed of programming where efficiency is valued over readability and maintenance is often in the back seat - much game code is unique to the game and will never be reused. Same with game engine code, if you’re dumb enough to try to write one of those (I am). A game project has a LOT of code, gigabytes of assets, many scripts and build pipelines - they’re very complicated.
In my last post, I discussed working with AI coding agents on larger projects over longer timeframes, and how to really use these things to be productive. I can’t imagine a better stress test of an AI coding agents abilities than to throw a 1M+ LOC C++ codebase at it and try to do productive things in it. I do a lot of reading and I try to keep up with the current AI zeitgeist - to date, I have to agree with the AI skeptics - I only see a lot of trivial code or applications being written by AI coding agents.
Many people will test an AI coding agent with home grown benchmark tests. Write me a todo app. Write Flappy Bird. Write a CLI application. Others claim to have vibe coded products, but usually they’re private to the vibe coder (which is great! that’s the whole point!) and often very simple CRUD applications that just store data, manipulate it, and display it to the user. You know - business-y apps.
Just this year, I’ve tried to vibe engineer:
a CLI coding agent in
Cosmopolitan C
Python
Ruby
A Cosmopolitan C HTTP library wrapping
libcurlA Cosmopolitan C evented concurrent hybrid vector/NoSQL database on top of
FAISSandRocksDBwith a natural language query enginean AI driven research tool using RAG augmented by a knowledge graph containing citation edges, concept edges and entity relationship edges
All with relatively mixed success - mostly failure in that I ended up either shelving or abandoning those codebases, but success in that each codebase itself did get to a certain level of functionality before I felt I hit the wall.
The Wall
I feel like there’s this complexity wall that AI coding agents hit, where the amount of information needed in the context to perform valuable work is high enough to degrade how well it attends to all tokens in that context - this is known as context rot.
On all of my vibe coding projects, I put a lot of effort into maintaining clean and correct CLAUDE.md files to steer the agent, and to attempt to increase the efficiency of loading information into the context and keep the context as slim as possible. They also contain instructions on how the agent should write code - how to build, how to test, how to run, how to debug, what kind of comments to write, whether or not to maintain backward compatibility and how to refactor code.
On every project above, I found that Claude Code specifically would reach a point where any task I gave it would cause it to spend the next few minutes reading files and exploring the codebase and filling it’s context window with tokens. The work that it performed frequently violated directives in the CLAUDE.md files, it frequently hallucinated methods or parameters that didn’t exist, and it would frequently duplicate code because it couldn’t find it during it’s exploratory search.
Some of that is on me - there’s definitely a prompting skill component to this and the more specific and exact you are in your prompts, the less thrashing the agent has to do to learn the things you didn’t tell it. Having said that, I didn’t just give up the minute Claude started doing boneheaded things. I tweaked prompts, tried spec driven development, tweaked CLAUDE.md files, and wrote documentation. When I realized the agent was spending a lot of time pulling in information I could have given it, I’d restart the task with a new prompt.
Still, I kept hitting that wall. Earlier this week, I plugged my OpenAI API key into Codex, and told it to fix a weird render crash that I had spent probably 2 hours trying to coerce Claude into to fixing. It worked away diligently for about 15 minutes and told me it was fixed. And I’ll be damned if it was actually fixed. I’m not a huge fan of the Codex UX - I find it very terse and prefer the conversational nature of Claude. I went back to Claude.
A few nights ago, I hit another wall. Claude was trying to implement a skybox and had designed the render pass system and resource tracking in a way that the skybox was a little gnarly to integrate. I spent about 3 hours with Claude working through a refactor to get the skybox into the renderer. Claude tried several times, and each time I noticed it would try to hack around things, put in //TODO comments - basically do anything it could to just get the code to build regardless of whether it was correct. No matter how I broke a task down I couldn’t get it to perform valuable work.
Frustrated, I fired Claude and gave the task to Codex. I told Codex the current state of the task and the codebase and told it to fix it. Then I went to bed.
I woke up in the morning and checked on Codex. It had spent about an hour autonomously working away at the problem and announced that the game now builds and the skybox should show up. It did indeed run but the rendering was totally broken. Another quick 15 minutes with Codex, feeding it Vulkan error logs as I ran the game, and it had fixed the renderer - my skybox was appearing.
I cancelled my Claude Max subscription and switched over to OpenAI. I have made more progress on the renderer since doing that than I have in days.
Building New Things
I think my favorite ability of these AI coding agents is to fluently translate technical knowledge into code - any code you might reasonably want to write. This unlocks an entire universe of new kinds of things you can make as long as you understand how to make them.
I believe a lot of the reason I’m having the success I’m having whilst vibe engineering is due to the fact that 3D rendering and games are not new to me. I’ve at least done a fair bit of reading and implementation on my own through the years to understand the concepts that I’m working with. C++ isn’t even new to me however I would say my skills are definitely at the beginner level.
I have a Safari books online subscription and I’ve been reading every new edition of every rendering book I can get my hands on. I understand how Vulkan works conceptually. I understand how a deferred renderer works with regards to rendering to GPU RAM and compositing and blitting. I’m not trying to brag - I’m merely trying to set a baseline.
Part of the success of using these AI agents to do useful work is to be able to recognize when they’re going off the rails. If you’re working on something entirely new and relying on the agent to conceptualize, design, architect, and implement it for you with no understanding of what it’s building and how it works - you’re gonna have a bad time.
Sure you can guide the agent by testing what you’re trying to build and telling the agent to fix the issues. You will hit this wall though eventually. With a complex enough project, the agent won’t be able to get enough signal-to-noise in it’s context to maintain high performance. Different AI coding agents seem to have more or less capability before they hit that wall - I haven’t hit it yet with Codex, and Codex is surprisingly good at pushing back at me with giant tasks and telling me to do less. I’m confident that there’s a wall there too - I just haven’t found it yet.
Once you’ve hit the wall, it’s up to you to figure out how to guide your coding agent around it.




