There is a lot of marketing language around AI receptionists right now. Most of it skips the actual mechanics. This post explains how the technology actually works, what it does well, and where it still needs a human in the loop.
We will use Phantom Desk AI as the running example because it is what we built. The architecture below is similar to what most credible AI voice products are doing under the hood.
The voice layer
When a call comes in, the first job is moving audio between a human and a machine in real time.
That work is handled by a voice infrastructure layer. We use VAPI for ours. The voice layer does four specific things:
- Speech-to-text. The caller's audio is converted into text, in real time, while they are still talking.
- Text-to-speech. Generated responses are converted back into natural-sounding audio. The voice can be customized to match the business — most of our roofing customers want a voice that sounds like their actual front desk.
- Turn-taking. The system has to know when the caller is done talking, when to interject, and when to let silence breathe. This is harder than it sounds.
- Interruption handling. If the caller cuts in mid-sentence, the system has to stop talking, listen, and respond to the new input without losing the thread.
All of this has to happen at sub-second latency. A delay of even 800 milliseconds makes a phone call feel broken. The voice layer is what makes the call feel like a phone call instead of a chatbot.
The conversation engine
Once audio is text, the question is what the system says back.
That work is handled by a large language model. We use Anthropic's Claude. The LLM is not running freely. It is guided by configured rules.
"Configured rules" is doing a lot of work in that sentence, so let me unpack it.
For each business, we configure:
- Business hours and routing. What happens during open hours vs. after hours. Whether emergency calls get escalated to a cell phone. Which days are excluded.
- Qualification questions. For a roofer, that's "Is this storm damage or planned work? Single-family or commercial? What's the property zip code?" For a vet, it's a different list. The questions that matter are vertical-specific.
- Booking logic. When to offer a slot, how to handle a request outside service area, what to do if the caller wants to talk to a human.
- Escalation paths. Which kinds of calls always go to a human, immediately. We default to "if the caller is upset or asks for a person, hand off."
- Tone and phrasing. How the business introduces itself, what it's allowed to promise, what it's not allowed to promise.
The LLM handles the nuance. The configured rules handle the boundaries. Both matter. An LLM without rules will say things you don't want said. Rules without an LLM produce a phone tree from 2008.
How it books to real calendars
The receptionist is only useful if the booking actually lands somewhere.
We connect to Google Calendar via OAuth. When the AI offers a slot, it is reading availability live. When the caller confirms, the event is written back immediately, with the caller's name, phone, the qualification answers, and a transcript link.
For CRM write-back we use a direct API connection where one exists, and a webhook bridge where it doesn't. Most of our customers are using ServiceTitan, Jobber, HousecallPro, or a lighter CRM like HubSpot. Lead source, qualification answers, and call recording link all flow through.
The confirmation goes out by SMS via Twilio and by email via Resend. The caller has a confirmation in their pocket before they hang up.
This is the part most owners underestimate. The voice is not the moat. The integration into the rest of the business is the moat. A receptionist that takes a perfect call but doesn't put it on the calendar is still a missed call.
What it does well
Some honest framing on capability.
AI receptionists are very good at:
- High call volume. A storm hits, 40 calls come in over 30 minutes, every one of them gets answered in full. There is no queue.
- After-hours. 6pm to 8am is most of the lost-call window for service businesses. The system doesn't notice the time.
- Repetitive intake. Asking the same eight questions for the 200th time today, with the same patience as the first time.
- Scheduling. No double bookings, no "let me check and call you back," no missed write-backs.
- Transcript logging. Every call is recorded, transcribed, and analyzed post-call. Owners get a daily summary of what happened on the phone yesterday — most have never had this visibility before.
The system never gets tired. Never has a bad day. Never goes off-script in a way that costs you a job.
What it can't do well yet
This is the part most vendors skip. We won't.
AI receptionists are not good at:
- Deeply emotional conversations. A pet owner whose dog just collapsed needs a person, not a flow. We hand those off.
- Complex multi-step technical diagnostics outside the configured flows. If the caller is describing a problem the rules didn't anticipate, the system will gather what it can and route to a human. It will not improvise a diagnosis.
- Situations the rules didn't anticipate. Edge cases happen. The system is good at recognizing it doesn't know and escalating, but it cannot replace human judgment in genuinely novel situations.
This is why the first 50 to 100 calls on every account are reviewed by a human on our side. We watch the transcripts, find the edge cases, and tighten the rules. After that the system is dialed in for the specific business.
Where humans still need to be in the loop
To be clear about where the line sits.
Humans need to handle:
- Upset customers who ask for a person
- Edge-case complaints that need judgment, not policy
- Post-call review for the first 50 to 100 calls on a new account, to tune the rules
- Ongoing review of escalated calls — typically 5 to 10 percent of volume
Our customers are not firing their front desk. The good ones are using the system to make sure the front desk never misses a call while they are on another call, in a meeting, or asleep.
The honest summary
The point isn't to replace your front desk. The point is to make sure no call goes unanswered when your front desk is on the phone with someone else, in a meeting, or asleep.
If a vendor is selling you something more aggressive than that, ask harder questions. The technology is real and it works. The marketing around it sometimes isn't.
Interested in being a founding customer? Book a 20-minute demo and let's talk.