The Tokenomics of a Second Brain on Codex
Thinking Machines field note
A visual breakdown of one month of my usage of Codex as a second brain system.
Every May session, stacked by day
Each bar is a day. Each segment is one Codex session. Height marks total tokens.
Meet my expanded second brain on Codex
I use Codex via ChatGPT Pro for my second brain. Since my team is about to do a coding agent budgeting exercise for our team, I wanted to explore my own usage. In practice, I use Codex less like a chatbot and more like a working layer over my operating system: it reads my local markdown vault, searches notes, runs scripts, cleans up transcripts, drafts strategy, updates trackers, and leaves behind artifacts I can use later.
This visual breaks down one month of my local Codex sessions. It shows what I was working on, how many tokens those sessions used, which sessions called tools or skills, and what came out the other side. Across the month of May, I used Codex across 188 sessions and ran up a bill of 295.3M tokens.A token is a small chunk of text that models read or write. OpenAI API prices are usually listed per 1M input, cached input, and output tokens. See the current OpenAI pricing page.
I also priced the usage using GPT5.5 pricing, which is directionally accurate since that’s the model I used the most: $5.00 per million input tokens, $0.50 per million cached input tokens, and $30.00 per million output tokens. That would come out to about $314.81 if billed directly by those token categories.
EOD-Meets is my favorite workflow
eod-meets is my daily automated workflow that scans my calendar, and automatically downloads and processes my transcripts from Fathom and Granola.This is the kind of thing agents are good at when the inputs and definition of done are explicit: find the meetings, fetch the transcripts, clean the notes, update the index.
Across May, I ran this workflow 35 times, accounting for 76.6M observed tokens (just above 25% of my total use) and costing about $87.89 under the June 6 pricing table. That is a real chunk of usage, but it is also one of the clearest examples of value creation because the output is not just a response in a chat window. The workflow can create cleaned transcript notes, update my meetings log, surface action items, and make the day searchable after my working memory has exited the building. I should probably optimize this workflow to see if I can get a cheaper model to do this job.
A typical one-hour meeting produced about 10K-14K transcript tokens in my May vault. Turning that into a cleaned, searchable meeting note through eod-meets cost roughly $3-4, including the tool calls, vault reads, note creation, and meetingslog updates around the transcript itself. The raw transcript is not the whole cost story: the real workflow also carries skill instructions, working context, tool outputs, and prior state. Useful overhead, but overhead.
The reason this matters is that meetings are usually where high-context work goes to die. People make decisions, share nuance, expose risks, and then the actual record becomes a calendar title and a recording that no one ever watches again.
With this particular workflow, I’m able to cut down the time needed to write emails, decks, and quickly send next-step packets at the end of the day.
Agent skills are workflow accelerators
A skill is a way to package a workflow: what files to check, what sequence to follow, what output format matters, what safety rule not to forget, and what “done” means. Skills can package software scripts, assets like images and template excel files, and prompts all together.In Codex, a skill packages instructions, resources, and optional scripts so the agent can follow a workflow reliably. OpenAI’s Codex skills docs explain the format.
In my May logs, 104 sessions used skills, with 234 total skill calls across 45 unique detected skills. My top five were briefing, eod-meets, secondbrain-decision-support, tech-scoping, and pricing.
The best skills behave less like templates and more like operating procedures. They compress repeated judgment into a callable workflow while still leaving room for the model to reason through the specific case. For non-engineers, this is probably the most important concept: You can get a lot out of AI by writing down the workflow well enough that an agent can execute it repeatedly.
I’ve been getting a lot of practice building skills that combine deterministic traits like: state, permissions, evals, or integration, and non-deterministic judgment calls through combining software script calls with prompts. This matters a great deal for my pricing skill, where I want to use my sheets formulas directly and NOT to have GPT5 try to come up with our pricing calculations fresh every time.
Creating artifacts like PDFs, Excel files, and HTML sites
In this month, Codex produced a total of 3,053 pieces of work for me like a file, report, sheet, script, cleaned note, proposal, or slide draft. 110 of my Codex sessions created or modified notes, reports, CSVs, scripts, meeting logs, task lists, and working documents. These artifacts are useful pieces of output, and also filed back into my personal context and personal knowledge hub for future use.Artifact just means durable output: a note, CSV, deck, script, spreadsheet, page, or report that exists after the chat ends. The boring file is the product. Tragic for vibes; excellent for operations.
Personal Knowledge Hub
The way I use codex continually builds up my personal knowledge hub/second brain. I use both terms interchangeably here, but it’s arguably more correct to call this body of context the knowledge hub, and the superset of the hub plus the skills my “second brain.”
In May, I created 71 new notes in my knowledge hub, excluding raw data and multi-file artifacts that I put into raw/, scratchpad/, outputs/, and local version snapshots.
The index grew too. meetingslog.md is 832 lines longer than the Apr 28 local snapshot (1,805 lines then, 2,637 lines now).
That matters because the hub is doing two jobs at once. The individual notes preserve source material. The log creates a navigable map across time. Without the log, the vault becomes un-navigable.
The other useful signal is topic concentration. Among May-created notes, the most common tags were #tm/strategy, #type/transcript, #tm/sales, #tm/people, and #type/tech.
For readers trying to copy the pattern, I would not start with a perfect taxonomy. I would start with three durable objects: a meeting index, cleaned source notes, and a small set of project hubs. The taxonomy can improve later. The habit of saving the receipts is the foundation. I have separate blog posts and teach classes on how to build your own second brain, so let’s pin that topic for another time.
What Is Expensive And Why?
The most expensive sessions were mostly mixed-use workbenches where I processed meetings, ran analysis that required digging through my personal knowledge hub, and created some type of output: proposal, deck, website, or code.
The top sessions combined file reading, tool calls, drafting, image or document work, workflow design, script execution, and follow-up changes. The largest single session used 37.9M observed tokens and would cost about $28.54 under the June 6 pricing table. The second largest used 34.1M tokens and would cost about $32.39.
First cost driver, long context that gets carried forward. Once a session includes transcripts, tool outputs, skill instructions, generated drafts, and file contents, later turns inherit a larger working set. Cached input makes this much cheaper than sending fresh context every time, but “cheaper” is not the same as “free.”Cached input is repeated context the system can reuse at a discount. It is why long work sessions do not scale linearly with sticker-price input tokens, which is merciful because transcripts are extremely verbose.
Second cost driver, tool use is pricey. The month shows 3,481 detected tool calls across 99 tool-using sessions. Tool calls are valuable because they let Codex inspect and change the real workspace. Tool use includes writing code, making PowerPoints, and making Excel sheets.
Some thoughts
Agent sessions become workbenches, not conversations.
The chart should make clear that heavier sessions are usually doing multi-step work: reading files, calling tools, producing artifacts, and incorporating feedback. The unit of analysis is not a prompt. It is a work session.A prompt is one instruction. A session is the whole workbench: context, tool calls, edits, corrections, and the final artifact trail.
Caching changes the cost story.
The raw input number looks alarming because input is counted every time context is sent. Cached input is a discounted subset of input, which means the same long context can be much cheaper on later turns. The cost story is not “all input is equally expensive.” The cost story is “long context still matters, but cache behavior changes the slope.”
What should teams instrument from day one?
At minimum: session category, token split, cache rate, tool calls, skill calls, artifacts created, human review needed, and whether the output was reused. Without those, teams will argue from vibes.
Is it worth $315?
The ChatGPT Pro Plan at $100/month is feeling like an absolute steal at these price estimates! Honestly, I wasn’t aware of how much I was spending until I ran this analysis. Looking at my spend, I could certainly lower the cost by optimizing the heaviest eod-meets workflow. On the other hand, I’ve ramped up my Codex usage in the last week since I’ve seen the latest models’ performance on spreadsheets and financial data science.
I’d say that these workflows easily get me producing twice as much as I normally would in a week, so from that perspective, I’d be happy to pay for $100/month plans for myself and my consulting team. Before we roll out API pricing across the enterprise, I’d be looking very carefully at target usage and seeing where we can build smart plugins and shared tooling for our non-technical users. I’d want to set organizational guardrails for them so Codex doesn’t jump right into expensive tool use, and perhaps we’ll look at something like OpenRouter later to route tasks to the cheapest model that can do the job.