How I Built the AI Agent Stack That Runs Outbound for Our Clients

I run four active client outbound campaigns simultaneously. No team of SDRs. No mornings spent manually checking inboxes or updating spreadsheets.

Every morning at 8am, an AI agent pulls all unread replies from LinkedIn and email, classifies each one, drafts a response, and drops it into a review queue. At 7am, a different agent checks campaign health across every active sequence and posts a summary to Slack. By the time I open my laptop, I know exactly what needs my attention. Nothing else has touched my day.

That's what we built. Here's exactly how it works.

The Problem I Was Solving

Most outbound agencies scale by adding headcount. One SDR per three to five clients. A campaign manager. A data person. I wanted to find out if you could do the opposite: build the infrastructure first, then scale clients without scaling the team.

The answer is yes. But the gap between possible and working reliably took several months of broken builds to close.

💡

The constraint was not my time doing outreach. It was my time managing the systems that do outreach. The same problem agencies solve by hiring, I solved by automating the operations layer.

Here's the full pipeline, from first signal to booked call:

🔍

Signal Detection

LinkedIn, job boards, web

→

🎯

ICP Filter + Audit

Score, qualify, clean

→

⚡

Enrichment

Clay, Prospeo, Apollo

→

📤

Sequence Launch

HeyReach + Instantly

→

🤖

Reply Handler

Classify, draft, queue

→

✅

Review + Send

Sheyda approves, agent sends

Each layer is a separate system. I'll walk through all of them.

Layer 1: Signal Detection

We don't build lists from static ICP criteria. We build them from behavioral signals: evidence that a company is actively in motion right now. We cover the signal taxonomy in depth in The Buying Signals We Actually Track.

The five signals we track for every client:

LinkedIn post activity. When a VP of Sales or Head of Revenue publishes a post about a problem our client solves (pipeline challenges, team scaling, tool consolidation), that's a warm signal. We monitor via LinkedIn's content feed filtered by job title and keyword.

Job postings. A company posting for an SDR, BDR, or Head of Outbound is telling you exactly where they're investing. We pull from Indeed and LinkedIn Jobs daily and filter for titles that match the ICP.

Web mentions. Companies that just raised funding, launched a new product, or made the news are in motion. We use Serper.dev for Google News signals filtered by company size and vertical.

Tech stack changes. A company switching from HubSpot to Salesforce, or adding a sales engagement tool, signals an active GTM investment cycle.

Reddit discussions. Founders and operators posting in communities about the exact problems our clients solve. Lower volume, higher intent.

Not every signal is equal. We score each one by recency and intent strength, then filter to the top accounts. A job posting from last week scores higher than a LinkedIn post from last month. An account triggering three signals simultaneously jumps to the top of the queue.

→

The filtering threshold matters more than the signal sources. We'd rather work with 40 sharp accounts than 400 weak ones. Every time we pushed volume, reply rates dropped. Every time we tightened the criteria, they went up.

Layer 2: List Building and the Audit Gate

Once we have signal-qualified accounts, we build the contact list. This means finding the right people, not just anyone with a matching job title.

We use a waterfall approach through Clay: Prospeo first for email verification, then Apollo as the fallback, then manual research for accounts that matter enough to warrant extra attention. LinkedIn profile URLs go to HeyReach for the LinkedIn touchpoints.

Before any list goes into an active campaign, it runs through a mandatory seven-point audit:

Title match: does the person's actual title match the ICP persona, not just a keyword?
Seniority check: are we reaching decision-makers or influencers, depending on the client's sales motion?
Domain health: no catch-all domains, no role-based addresses, no generic info@ or hello@
Company size verification: headcount matches the ICP range, not just the company's claimed size
Duplicate detection: no one who's been contacted in the last 90 days
Blacklist check: no existing customers, partners, or investors
Recent signal confirmation: is the trigger signal still current? A job posting from six months ago is noise.

Any list that fails the audit gets fixed or cut before import. We learned this the hard way when a single batch with high catch-all rates caused a deliverability spike that took three weeks to recover from.

Layer 3: The Sequence Layer

For LinkedIn outreach, we use HeyReach. It handles multi-account rotation, connection requests, message sequences, and InMail, all from one dashboard. We run dedicated LinkedIn accounts per client to keep sender reputation isolated.

For email, we use Instantly and Smartlead depending on the client's infra setup. Every client gets dedicated sending domains, never the primary domain. We warm new domains for three to four weeks before any cold sends go out.

The sequence structure we use for most clients: connection request or first email, then a value-add follow-up, then a social proof or case study touch, then a close or break-up message. Three touches after the opener. That's it.

⚠️

We never run more than three follow-ups without the lead showing any engagement signal. Chasing past that point hurts deliverability and burns the account. If someone has not responded to four touches, the timing is wrong, not the message.

Layer 4: The Reply Handler

This is the hardest part of the system to build and the most valuable once it works. See the full reply handler breakdown for implementation details.

Every reply needs to be classified before you know what to do with it. "Interested" and "not right now" look similar at first glance. "Wrong person, ask for Sarah" is a gift if you catch it. "Unsubscribe" needs to be handled immediately to protect deliverability. A generic positive reply that's actually a form auto-response is a false signal that wastes follow-up time.

We handle four platforms: HeyReach (LinkedIn), Instantly, Smartlead, and Lemlist. Each has its own API and its own reply format.

Here's how the handler works:

Pull unread replies from all platforms

A cron job runs at 8am and 2pm PDT on weekdays. It calls each platform's API, pulls all unread messages since the last run, and deduplicates against Airtable to avoid processing the same reply twice.

Classify each reply

Our AI agent reads the full conversation thread and classifies the reply: Interested, Not Now, Wrong Person, Referral, Objection, Unsubscribe, or Auto-Reply. Each classification has a different next-step protocol.

Enrich the sender

For LinkedIn replies, we run the sender's profile through Apify to get current job title, company, and seniority. This makes the draft response more relevant. We know exactly who we're talking to.

Draft the response

The agent writes a draft reply tailored to the classification, the conversation history, and the enriched profile. It follows the client's approved messaging framework and voice.

Queue for review

The draft goes to Airtable. I get a Slack notification. For the full day-to-day picture of how this fits into the workflow, see What Agentic Outbound Actually Looks Like. I review the draft, edit if needed, and approve. The agent sends it. Nothing goes out without my eyes on it.

The approval gate is intentional. Fully autonomous reply-sending sounds appealing until you have one bad classification send an aggressive follow-up to someone who already said yes. The human review step costs maybe 15 minutes a day. The downside protection is worth more than that.

At the end of each run, the agent posts a consolidated summary to Slack: new replies processed, classifications, drafts queued, follow-ups sent. One message, not a ping per reply.

The Infrastructure Layer

All of this runs on a single server on Hostinger, a Docker container running OpenClaw (our AI agent layer). OpenClaw connects to Slack via Socket Mode and executes tasks when triggered by cron or by direct message.

The cron schedule:

Job and time

What it does

reply-check-morning · 8am PDT Mon–Fri

Pulls and processes all unread replies

reply-check-afternoon · 2pm PDT Mon–Fri

Second pull for same-day replies

campaign-status-morning · 7am PDT Mon–Fri

Campaign health check + Slack summary

campaign-status-evening · 5pm PDT Mon–Fri

End-of-day pipeline snapshot

We also run a watchdog script that monitors the OpenClaw container every 15 minutes. If the last Slack connection log shows a disconnect or timeout, it restarts the container automatically. This was added after too many mornings where the cron ran but nothing executed because the agent had silently gone offline at 3am.

Airtable is the source of truth for everything. The Reply Queue table holds every reply, its classification, its draft, its status, and its timestamp. The agent reads from and writes to Airtable at every step.

What Broke and What I Learned

The webhook-based reply system failed constantly. The original architecture used webhooks from HeyReach and Instantly to trigger n8n, which normalized the data and pushed it to Airtable. Every webhook had a failure mode. Every n8n node added a new failure point. A missed webhook meant a missed reply. We had five points of failure in the pipeline.

I replaced the whole thing with a pull-based cron. The agent asks the platforms what's new instead of waiting to be told. One failure mode instead of five. Reliability went from roughly 85% to effectively 100%.

The watchdog was restarting the container on a false positive. It checked how old the last "socket mode connected" log was. OpenClaw writes that log every 60 minutes as a heartbeat, so the watchdog always saw a "stale" entry and restarted the container. Fixed it to check the content of the log instead of the age: if the last Slack log says "connected," it's healthy.

Volume-based list building broke a client's domain. Before we formalized the audit process, we imported a list with a high proportion of catch-all domains. Deliverability dropped. It took weeks to recover. The seven-point audit protocol exists because of this.

💡

Every formal process in this system exists because something broke without it. The audit gate, the approval loop, the pull-based cron, the watchdog. All scars from real failures. Build the process after the failure, not before. You will not know what to protect until something breaks.

What the Numbers Look Like

These are averages across active client campaigns over the past 90 days:

18%

avg reply rate across LinkedIn campaigns

15 min

daily time spent on reply review

active client campaigns, one person

The reply rate is higher than the industry average for cold outreach because we're not sending to everyone. We're sending to accounts that are actively signaling.

What This Isn't

This system doesn't replace sales judgment. It replaces the operational work that surrounds sales: pulling data, managing sequences, handling replies, updating CRMs, checking campaign health.

Every important decision still has a human in the loop. Every draft reply goes through review. Every list goes through an audit. Every campaign strategy is set by someone who understands the client's market.

What the agent handles is the volume of coordination work that would otherwise require two or three people. That's what lets one person run the infrastructure for four clients.

Frequently Asked Questions

What is an AI outbound agent? An AI outbound agent is a system that automates the operational tasks in a cold outreach workflow: pulling signals, building lists, monitoring campaign health, and processing replies, without replacing the human decisions that require context and judgment.

What tools do you need to build something like this? At minimum: a campaign platform (HeyReach for LinkedIn, Instantly or Smartlead for email), an enrichment layer (Clay, Prospeo, or Apollo), an AI agent framework (we use OpenClaw), an ops layer (we use Airtable), and a server to run the cron jobs (we use Hostinger with Docker).

How long did it take to build this system? The core pipeline took about three months to get to reliable production. Each component works quickly in isolation. The time comes from integration, failure handling, and learning what actually breaks in production.

Do you need to know how to code to build this? Not deeply. Most of what we built uses APIs and configuration rather than custom code. The biggest technical requirement is being comfortable with API calls, JSON, and reading documentation.

Is this fully autonomous? No, and intentionally so. The reply drafting and review step has a human in the loop. Full autonomy sounds efficient until one misclassified reply sends the wrong message to the wrong person. The 15 minutes a day of review is worth the protection.

The infrastructure is the constraint, not the strategy. Build it once and the client capacity ceiling moves.

If you're running outbound for clients and spending more than two hours a day on operations work, the infrastructure is the bottleneck. Not the strategy.

Book a call to see how we'd build this for your agency.

How I Built the AI Agent Stack That Runs Outbound for Our Clients

Sheyda Rezaei

The Problem I Was Solving

Layer 1: Signal Detection

Layer 2: List Building and the Audit Gate

Layer 3: The Sequence Layer

Layer 4: The Reply Handler

The Infrastructure Layer

What Broke and What I Learned

What the Numbers Look Like

What This Isn't

Frequently Asked Questions

Get this in your inbox

Related articles

Billing Mistakes I Made Building an Outbound AI Agent Stack So You Don't Have To

How to Run LinkedIn Outreach Like the Top 1%

What Agentic Outbound Actually Looks Like Day to Day