Six months into running Build to Scale, the operations layer was breaking down. A spreadsheet per client for lead lists, separate docs for reply tracking and domain health, another for campaign status. Every morning I opened my laptop to a pile of open tabs and spent the first hour catching up before I could do any actual work.
I was running outbound for four clients simultaneously. No dedicated ops team. No SDR. Just the infrastructure I had not built yet.
That is when I decided to build the systems first and scale the client base second.
Why I Went Agentic
The honest answer is not that I thought AI agents were the future. It is that I was the only person doing five jobs and running out of hours.
Here is what my week looked like before:
- Pull replies from LinkedIn manually, copy them into a doc
- Check every campaign in Instantly and Smartlead for bounces and health issues
- Write follow-up messages for every positive reply across all four clients
- Update each client's Airtable with status changes
- Do it all again on Monday
The operations layer was the ceiling, not the client work itself.
The constraint was not my time doing outreach. It was my time managing the systems that do outreach. The same problem agencies solve by hiring, I needed to solve by automating the operations layer.
The moment I reframed the problem from "I need help" to "I need infrastructure," everything changed. Hiring would have solved the symptom. Building the agent layer solved the cause.
What I Built (and What Broke First)
The first version of the system was manual in practice, automated in name only. I had n8n workflows that pulled data from APIs and dumped it into Slack. It looked like automation. It required me to check Slack every hour and manually trigger the next step.
That broke within three weeks. The workflows ran on a push model. Every platform had a different webhook format. When HeyReach changed their API response structure, three workflows broke silently and I did not notice for two days.
The second version was better. I moved to a pull-based cron schedule. Instead of platforms pushing data to me, my system asks for it on a schedule. 8am: pull all unread replies. 7am: check campaign health. Every two hours: scan for anything missed.
That version held. It is still what runs today.
The third thing that broke was my assumption about what "automated" meant. I built a reply handler that classified incoming messages and drafted responses. I assumed I would approve them all in five minutes. The first week, I approved 34 drafts in about eight minutes. The second week, one draft was wrong. Not catastrophically wrong. Just a tone mismatch for a specific client. No one noticed. But I noticed.
I added a mandatory review step that I cannot bypass. Every draft sits in a queue. I approve before anything sends. That is not a limitation of the system. That is the design.
Full automation without a human review step is not a feature. It is a liability. The goal is not to remove yourself from the loop. It is to only be in the loop when your judgment actually matters.
What the Architecture Looks Like Now
The system I built is called OpenClaw. It runs on a server on Hostinger. No fancy cloud setup. Just a machine that is always on, running scheduled jobs.
What Done Actually Means
I shipped several iterations before landing on something I trusted. Each version worked right up until it did not.
The difference between a demo and infrastructure is what happens on a bad day. When an API changes. When a client's campaign gets paused mid-sequence. When a prospect replies in a language the classifier has not seen before.
The system I have now routes all of those edge cases to me with enough context to handle them in under two minutes. That is done.
What I Would Do Differently
I would have built the reply handler first. Everything else in the system is about getting to a reply. Once you have a reply, you have a conversation. Conversations become clients. I spent the first three months optimizing the top of the funnel while the bottom was still fully manual.
I also would have set up the audit gate earlier. Running a bad list through a campaign is not just a waste of time. It is a domain reputation problem that takes months to recover from. The audit protocol exists because of a real incident, not because it seemed like a good idea in theory.
Build your system in reverse. Start with what happens after a reply and work backward. The sequencing and enrichment layers matter less if you cannot handle the conversations they generate.
Frequently Asked Questions
How long did it take to build this system? About five months from the first broken prototype to something I trusted enough to run client campaigns on. The core reply handler took three weeks. The signal detection layer took six weeks. The hardest part was not the code. It was learning what the system needed to handle before I understood what could break.
Do you need to know how to code to build something like this? Not necessarily. The concepts matter more than the syntax. Understanding APIs, webhooks, cron schedules, and data flow will take you further than any specific programming language. I built most of this iteratively, breaking things and fixing them.
How many clients can this setup support? The system currently handles multiple active campaigns concurrently. The practical ceiling is around eight to ten before client relationship management becomes the bottleneck, not operations. The agent handles the ops layer. The human still needs to handle strategy and the client relationship.
What is the biggest risk of agentic outbound? Sending something you should not have sent. The mandatory review step exists entirely to prevent this. If you remove the human from the loop on outgoing messages, you are one classification error away from a bad client interaction.
The ops layer is the ceiling. Build the system first, then the business scales on top of it.
If you are building outbound without an agent layer, see how we run it for clients.
