What I Learned Building an AI Project Manager from Scratch

At Three Seven Marketing, we run a lot of retainer clients. That means a lot of recurring tasks, a lot of meeting follow-through, a lot of "what did we tell them we'd do by Friday" conversations. It's the operational layer under the strategy work, and it's genuinely hard to manage well without a system that has memory.

Most of our workflow lives in Monday.com, Toggl, Fireflies, and Google Drive. Four different tools, none of them talking to each other in any particularly intelligent way. So I built something that sits on top of all of them: Pip, a custom AI project manager built in Claude Code with slash commands that give our team one place to run retainer work, meeting follow-through, and weekly capacity planning.

It works well now. Getting there was instructive.

Claude Code Monday.com MCP Toggl Fireflies Google Drive

The hardest part wasn't the AI

When people imagine building an AI system, they usually imagine the AI part being hard. Writing the right prompts, getting the model to behave, figuring out what it can and can't do. That's all real work, but it's not where I spent most of my time.

The hard part was sitting down and actually mapping what the system needed to do. Not in the abstract — not "it should help with project management" — but specifically. What does a weekly retainer check-in actually require? What information does someone need to have available before that meeting, and where does that information live? What happens after the meeting that currently falls through the cracks?

You have to understand a workflow before you can automate it. This is so obvious it sounds patronizing to say out loud, and yet almost everyone tries to automate before they've done that work. They pick a tool, start building, and discover three months in that they've automated a broken process and made it faster to do things wrong.

I spent real time documenting our actual workflows before I wrote a single prompt. What we do for each client, in what order, how often, what decisions get made by whom. The AI came after that. It only works because the map came first.

Prompts are the last 10%

There's a version of AI building that treats prompts as the product. You write a really good prompt, you paste it in, the AI does something impressive. This is how most AI tutorials work, and it's not wrong exactly, but it gives you a completely skewed picture of where the value actually comes from.

For Pip, the prompts are maybe the last 10% of what makes it useful. The 90% is everything underneath them: the structured markdown files for every client, the context documents that explain how we work and what our standards are, the files that describe each recurring task pattern and what a good output looks like for that task. The AI is only as useful as what you give it to work with.

Building those context files took a long time. Not because they're technically complex — they're just markdown — but because I had to think clearly about things that had previously existed only as accumulated intuition. What does a good monthly client report look like? What information do we always need before a strategy call? What are the things that, when they go wrong, are almost always because someone missed a specific step?

Writing it down is the work. The prompts reference what you wrote. That order matters.

It broke. More than once.

The first version of Pip was genuinely impressive to demo. It could pull data from Monday.com, summarize Fireflies meeting transcripts, generate a status update. I showed it to a few people and they were interested. Nobody used it.

After enough time watching that, I started asking why. The answer was friction. Pip version one required context I hadn't built in yet — you had to know what to ask for, in what format, and it still sometimes returned things that needed significant cleanup before they were useful. Using it added a step instead of removing one. For busy people running client work all day, that math doesn't work.

The rebuild started from a different question: where does work actually get dropped or duplicated right now? Not "what could an AI do" but "what are the specific points in our workflow where things go sideways?" Meeting follow-up tasks that don't make it into Monday. Time logged in Toggl that doesn't match what we told the client. The gap between what was discussed and what got documented.

When I rebuilt Pip around those specific friction points instead of around impressive capabilities, it started getting used. That's the test. Not whether it demos well — whether it actually gets used.

The real value is memory, not speed

Here's the thing I got wrong at first: I was thinking about this as an efficiency problem. How do we do things faster? How do we reduce the time it takes to do a monthly report or prep for a client call?

Speed is fine but it's not where the value landed. The value landed in memory.

A good AI system remembers what you decided last time. It remembers what you told the client in March and can surface that when you're drafting the April recap. It remembers what you said you were going to follow up on and asks about it. That's not efficiency. It's cognitive offload — moving something out of your head and into a system that will surface it at the right moment.

When I thought about it as an efficiency tool, the question was "how much time does this save?" When I thought about it as a memory tool, the question became "what are we currently dropping because nobody has the mental bandwidth to carry it?" That second question has a much more useful set of answers.

Memory is also where AI systems have a structural advantage over traditional project management tools. Monday.com is very good at tracking tasks. It doesn't read your meeting transcripts and notice that a client mentioned a concern about Q3 planning that nobody created a task for yet. Pip does that. Not perfectly, but consistently enough to be worth something.

You probably don't need to build Pip

Pip is internal to Three Seven. It's not a product, it's not publicly available, and building it required a meaningful investment of time and some technical comfort. I'm not writing this to suggest that everyone should build a custom AI project manager.

The thing worth taking from this is the underlying question: what are you carrying in your head right now that a system could carry instead?

That's what Roux was for dinner planning. Every week, I was loading a set of variables into working memory and running a calculation that took real cognitive effort. I built something that runs that calculation. Same principle, very different context.

The entry point doesn't have to be a custom-built system with slash commands and API integrations. It can be a structured prompt you run every Monday morning. It can be a template that captures meeting decisions in a consistent format. It can be a document you maintain that an AI can reference to give you useful output instead of generic output. Start with the question of what you're carrying, and work backward from there to what would actually help carry it.

That's where the systems worth building come from. Not from what's technically possible, but from what's actually weighing on you.