From One Person to an AI-Powered Agency, What I Learned

Six months into running Build to Scale, the operations layer was breaking down. A spreadsheet per client for lead lists, separate docs for reply tracking and domain health, another for campaign status. Every morning I opened my laptop to a pile of open tabs and spent the first hour catching up before I could do any actual work.

I was running outbound for four clients simultaneously. No dedicated ops team. No SDR. Just the infrastructure I had not built yet.

That is when I decided to build the systems first and scale the client base second.

Why I Went Agentic

The honest answer is not that I thought AI agents were the future. It is that I was the only person doing five jobs and running out of hours.

Here is what my week looked like before:

Pull replies from LinkedIn manually, copy them into a doc
Check every campaign in Instantly and Smartlead for bounces and health issues
Write follow-up messages for every positive reply across all four clients
Update each client's Airtable with status changes
Do it all again on Monday

The operations layer was the ceiling, not the client work itself.

The constraint was not my time doing outreach. It was my time managing the systems that do outreach. The same problem agencies solve by hiring, I needed to solve by automating the operations layer.

💡

The moment I reframed the problem from "I need help" to "I need infrastructure," everything changed. Hiring would have solved the symptom. Building the agent layer solved the cause.

What I Built (and What Broke First)

The first version of the system was manual in practice, automated in name only. I had n8n workflows that pulled data from APIs and dumped it into Slack. It looked like automation. It required me to check Slack every hour and manually trigger the next step.

That broke within three weeks. The workflows ran on a push model. Every platform had a different webhook format. When HeyReach changed their API response structure, three workflows broke silently and I did not notice for two days.

The second version was better. I moved to a pull-based cron schedule. Instead of platforms pushing data to me, my system asks for it on a schedule. 8am: pull all unread replies. 7am: check campaign health. Every two hours: scan for anything missed.

That version held. It is still what runs today.

The third thing that broke was my assumption about what "automated" meant. I built a reply handler that classified incoming messages and drafted responses. I assumed I would approve them all in five minutes. The first week, I approved 34 drafts in about eight minutes. The second week, one draft was wrong. Not catastrophically wrong. Just a tone mismatch for a specific client. No one noticed. But I noticed.

I added a mandatory review step that I cannot bypass. Every draft sits in a queue. I approve before anything sends. That is not a limitation of the system. That is the design.

⚠️

Full automation without a human review step is not a feature. It is a liability. The goal is not to remove yourself from the loop. It is to only be in the loop when your judgment actually matters.

What the Architecture Looks Like Now

The system I built is called OpenClaw. It runs on a server on Hostinger. No fancy cloud setup. Just a machine that is always on, running scheduled jobs.

Signal detection

An agent scans for buying signals daily: job postings, LinkedIn activity, funding announcements, web mentions. It scores accounts and flags ones that cross the threshold for outreach.

List building and audit

Qualified accounts get pulled into a list. Before anything touches a sequencing tool, every contact goes through a 7-point audit. Catch-all domains, title mismatches, stale profiles, all filtered out before they can hurt deliverability.

Campaign management

Approved contacts load into HeyReach (LinkedIn) or Instantly/Smartlead (email). The system monitors health: reply rates, bounce rates, step completion. Anything below threshold triggers a Slack alert.

Reply handling

Every morning at 8am, the reply handler pulls unread messages from all platforms, classifies them into six categories, drafts a response for each, and drops them into a review queue in Airtable. I review and approve. Nothing sends without my eyes on it.

Client reporting

Campaign summaries post to client-specific Slack channels automatically. No status update emails. No calls to say "things are running." The system shows the work.

What Done Actually Means

I shipped several iterations before landing on something I trusted. Each version worked right up until it did not.

What I thought done meant

What done actually means

Runs without manual input

Fails gracefully and tells me when it fails

Handles every case automatically

Handles 90% automatically, routes the rest to me

I don't have to check it

I check it because I want to, not because I have to

Works when things go right

Works especially when things go wrong

The difference between a demo and infrastructure is what happens on a bad day. When an API changes. When a client's campaign gets paused mid-sequence. When a prospect replies in a language the classifier has not seen before.

The system I have now routes all of those edge cases to me with enough context to handle them in under two minutes. That is done.

What I Would Do Differently

I would have built the reply handler first. Everything else in the system is about getting to a reply. Once you have a reply, you have a conversation. Conversations become clients. I spent the first three months optimizing the top of the funnel while the bottom was still fully manual.

I also would have set up the audit gate earlier. Running a bad list through a campaign is not just a waste of time. It is a domain reputation problem that takes months to recover from. The audit protocol exists because of a real incident, not because it seemed like a good idea in theory.

→

Build your system in reverse. Start with what happens after a reply and work backward. The sequencing and enrichment layers matter less if you cannot handle the conversations they generate.

Frequently Asked Questions

How long did it take to build this system? About five months from the first broken prototype to something I trusted enough to run client campaigns on. The core reply handler took three weeks. The signal detection layer took six weeks. The hardest part was not the code. It was learning what the system needed to handle before I understood what could break.

Do you need to know how to code to build something like this? Not necessarily. The concepts matter more than the syntax. Understanding APIs, webhooks, cron schedules, and data flow will take you further than any specific programming language. I built most of this iteratively, breaking things and fixing them.

How many clients can this setup support? The system currently handles multiple active campaigns concurrently. The practical ceiling is around eight to ten before client relationship management becomes the bottleneck, not operations. The agent handles the ops layer. The human still needs to handle strategy and the client relationship.

What is the biggest risk of agentic outbound? Sending something you should not have sent. The mandatory review step exists entirely to prevent this. If you remove the human from the loop on outgoing messages, you are one classification error away from a bad client interaction.

The ops layer is the ceiling. Build the system first, then the business scales on top of it.

If you are building outbound without an agent layer, see how we run it for clients.

From One Person to an AI-Powered Agency, What I Learned

Sheyda Rezaei

Why I Went Agentic

What I Built (and What Broke First)

What the Architecture Looks Like Now

What Done Actually Means

What I Would Do Differently

Frequently Asked Questions

Get this in your inbox

Related articles

Billing Mistakes I Made Building an Outbound AI Agent Stack So You Don't Have To

How to Run LinkedIn Outreach Like the Top 1%

How I Built the AI Agent Stack That Runs Outbound for Our Clients