The Reply Handler: How We Classify Replies Across 4 Platforms

Every morning at 8am, OpenClaw checks every inbox across every active campaign, across all four platforms, for every client. It pulls every unread reply, classifies it, enriches the lead profile, drafts a response, and drops the draft into an Airtable review queue.

Nothing sends until I approve it.

That is the whole system in five sentences. Here is how it actually works.

Why Building This Was Harder Than It Looked

The first instinct when building a reply handler is to set up webhooks. Platforms push data to you when a reply comes in. You process it in real time. Seems elegant.

It broke immediately. Different platforms have different webhook formats. HeyReach sends conversation-level events. Instantly sends email-level events. When HeyReach changed their webhook payload structure, three workflows broke silently. I did not notice for two days. Two days of replies that looked processed but were not logged anywhere.

The fix was switching to a pull-based model. Instead of platforms pushing data to us, OpenClaw asks for it on a schedule. Every morning at 8am: pull all unread replies from every platform. Every afternoon at 2pm: same thing. The system never waits for a push. It always asks.

Pull-based means the failure mode is visible. If the system misses a run, the replies sit unread and get picked up next time. Nothing is silently lost.

💡

The difference between push-based and pull-based is the difference between a system where failures are invisible and one where failures surface automatically. Push sounds faster. Pull is more reliable. Choose reliability.

The Four Platforms

Each platform has a different API structure. The reply handler speaks all four.

HeyReach handles LinkedIn outreach. The inbox list command returns conversations with their full message history. The handler pulls unread conversations using the account-specific API key, processes each one, and marks them as seen only after the Airtable write succeeds, never before.

We run multiple HeyReach accounts: one for Build to Scale campaigns, one for client campaigns. Each account has its own API key and its own known LinkedIn account ID. The system validates the account ID before marking anything as seen, a safeguard against an old data issue where some records had incorrect IDs.

Instantly handles email campaigns. The handler pulls all unread email replies, fetches the full thread for new conversations, and marks emails as read after the Airtable write succeeds.

Smartlead handles email for some clients. Same logic, pull unread, fetch thread for new conversations, mark read after logging.

Lemlist handles one client's LinkedIn campaigns. Lemlist has no mark-as-read API, so the handler uses the Activities endpoint to check for new replies and processes anything not yet in the queue.

The Deduplication Rule

Before processing any reply, the handler checks whether a conversation record already exists in Airtable. This is the dedup check.

If the record exists: update the existing row with the new reply text, re-classify, and draft a new response. The record is a thread, it tracks the full conversation over time.

If the record does not exist: create a full new row with every field populated.

Critical ordering: the dedup check happens before marking anything as read. If Airtable write fails and the reply gets marked as read anyway, it is permanently lost. A reply that gets processed twice, because it was still marked unread, creates a harmless duplicate that the dedup logic catches on the second pass. The asymmetry matters: lost reply = unrecoverable problem. Duplicate processing = trivially handled by dedup.

Classification: 9 Categories

After fetching the reply, the system classifies the lead's message into one of nine categories.

Classification

What It Means

POSITIVE

Expressed interest, wants to know more, curious

SOFT YES

Said yes to a specific offer: 'go ahead', 'send it'

OBJECTION

Pushed back but left the door open

QUESTION

Specific question about service, pricing, or process

NEGATIVE

Not interested, remove me, stop contacting

REFERRAL

Pointed to someone else: 'talk to [Name]'

NOT ICP

Wrong person or wrong company

OOO

Out of office, auto-reply, left the firm

UNCLEAR

Cannot confidently classify, flagged for manual review

Routing after classification:

POSITIVE, SOFT YES, OBJECTION, QUESTION → draft a response
REFERRAL, UNCLEAR → flag in Slack for manual handling, no draft
NEGATIVE, NOT ICP, OOO → log to Airtable, no draft, no Slack ping

Negative replies get logged but do not create noise. They do not need action. The system handles them silently and closes the thread.

Lead Enrichment Before Drafting

For LinkedIn conversations (HeyReach and Lemlist), the handler enriches the lead profile before drafting. It pulls the LinkedIn URL from the conversation, runs it through an Apify actor, and extracts the lead's current headline, a summary from their About section, and their two most recent job experiences.

This enrichment costs roughly a cent per lead and changes the quality of every draft significantly. A draft written with "Director of Sales at CompanyX who previously ran outbound at a Series B SaaS startup" as context is a different output than one written with just a name and company.

For email platforms (Instantly and Smartlead), there is no LinkedIn URL in the conversation data. The handler uses platform data only and skips enrichment.

If enrichment fails: the handler continues without it, logs the failure to a system channel, and notes the gap in the draft context. It never blocks on enrichment.

The Draft

The draft is written after enrichment. The rules differ by classification.

A POSITIVE or SOFT YES reply gets a short, momentum-preserving response. Two to three sentences. The goal is to move toward a call as fast as possible. The handler does not re-explain the service. The lead said they are interested, the draft acts accordingly.

An OBJECTION gets acknowledged, then reframed from a different angle. The draft never argues with the objection and never repeats the same framing that generated it.

A QUESTION gets a direct answer followed by a question that keeps the conversation moving.

Every draft is written in the sender's voice. Each campaign has a named sender, Sheyda, or a client's team member. The drafts are signed with the first name of whoever runs that campaign.

No exclamation points. No em dashes. No "Great question!" No re-pitching what was already in the sequence. The tone is peer-level, warm, and specific to what the lead actually said.

The Airtable Queue

Every processed reply, regardless of classification, gets logged to the Reply Queue in Airtable before anything else happens. See the full tool stack for how all the pieces connect. Airtable is the source of truth. Even if Slack is down, the drafts are there.

The Reply Queue tracks:

Conversation ID and platform
Lead name, title, company, LinkedIn URL or email
Full conversation thread in [US/LEAD - date]: message format
Classification and the raw reply text
The draft ready for review
Status (New, Draft Ready, Draft Approved, Replied - Waiting, etc.)
Sender and client assignment

The status flow is the workflow. For the full day-to-day picture of how this fits into operations, see What Agentic Outbound Actually Looks Like. When a draft lands in the queue as "Draft Ready," that is my signal to review. When I approve it, the status moves to "Draft Approved." The next cron run picks that up and sends it through the right platform API.

After sending, the status moves to "Replied - Waiting" and a follow-up date is set three days out. If there is no reply by then, the system drafts a follow-up nudge and brings it back to "Draft Ready" for review.

After three nudges with no response, the thread closes automatically as "Not Responsive."

Pull (8am and 2pm daily)

All platforms checked. Unread replies fetched, deduplicated against existing Airtable records.

Classify

Each reply classified into one of 9 categories. Routing determined by class.

Enrich

LinkedIn profiles fetched via Apify for HeyReach and Lemlist conversations. Context built for drafting.

Draft

Response written using full thread, lead context, and classification rules. Tone matched to sender.

Queue

Draft logged to Airtable Reply Queue. Platform marks reply as read only after successful Airtable write.

Review

I review and approve in Airtable. Nothing sends without my eyes on it.

Send

After approval, the platform-specific send command runs. Status updates. Follow-up timer starts.

The One Rule That Does Not Move

Every reply goes through my review before sending. This rule is not optional and is not bypassed for any reply type, including obvious positives.

The reason is not that the drafts are usually wrong. It is that the drafts are sometimes right in a way that does not match a specific client's voice, a specific conversation's context, or a specific relationship nuance that the system cannot see. One approval takes less than thirty seconds. One wrong message sent without review can end a relationship that the campaign spent weeks building.

The mandatory review step is not a concession to the limitations of the system. It is the design.

⚠️

Full send automation without a review step is not efficiency. It is liability. The goal is not to remove yourself from the loop. It is to only be in the loop when your judgment actually matters, and to make that moment fast and informed by everything the system knows.

platforms checked every run

reply classification categories

2x daily

reply handler cadence

Frequently Asked Questions

What happens if the Airtable write fails? The reply is left unread on the platform. The next cron run, 8am or 2pm, picks it up again and processes it. The dedup check catches any duplicate processing. Failure is recoverable by design.

How does the system handle a reply in a language it hasn't seen before? It classifies it as UNCLEAR and flags it in the system channel. I review manually. The system does not guess when uncertain, it escalates.

Can it handle multiple replies from the same lead on the same day? Yes. The dedup check matches on conversation ID. Multiple replies in the same thread update the existing record rather than creating duplicates. The thread field gets appended with the new messages.

Why not just process replies when they come in (real-time)? We tried webhooks in the early version of the system. The failure mode is that replies can be marked received but never logged if the processing step fails mid-run. Scheduled pulls are more reliable: if a run fails, the reply stays unread. If we process it twice, dedup catches it.

How long does a full run take? For four to six active campaigns across all platforms, a full pull-classify-enrich-draft cycle takes about two to four minutes. The bottleneck is usually the Apify enrichment calls, which run sequentially per lead.

A reply is only as valuable as the speed and quality of the response it gets. The handler exists so the human is only in the loop when it matters.

If you are processing replies manually across multiple platforms, see how we would build this for you.

The Reply Handler: How We Classify Replies Across 4 Platforms

Sheyda Rezaei

Why Building This Was Harder Than It Looked

The Four Platforms

The Deduplication Rule

Classification: 9 Categories

Lead Enrichment Before Drafting

The Draft

The Airtable Queue

The One Rule That Does Not Move

Frequently Asked Questions

Get this in your inbox

Related articles

Billing Mistakes I Made Building an Outbound AI Agent Stack So You Don't Have To

How to Run LinkedIn Outreach Like the Top 1%

How I Built the AI Agent Stack That Runs Outbound for Our Clients