All research
9 min read

Inside MediConcierge: a clinic-safe AI patient concierge

The conversational AI layer in our UK private healthcare operations platform, and the part where the safety engineering concentrates. Future papers will cover the CRM, scheduling, and insights sides of the product.

Abstract

UK private healthcare clinics are a strange commercial environment for AI. They handle repetitive inbound questions well suited to conversational tools, but they also operate under UK GDPR, Caldicott principles, and a duty of care that treats a sloppy AI response as a safeguarding issue. This paper focuses on one component of MediConcierge, our clinic operations platform for UK private healthcare: the AI patient concierge that handles first-touch enquiries, triage, and booking. The broader platform covers practice management, patient and lead tracking, scheduling, and insights for clinic owners. We cover the concierge specifically because it is the component that touches patients directly, which concentrates most of the interesting safety engineering.

1. The problem framing

Private clinics in the UK sit in an operational gap. They handle clinical matters with clinical rigor, but most of their inbound web traffic is not clinical. It is people asking whether the clinic takes Bupa, whether there is a slot this Saturday, whether the gynaecology consultation includes imaging, whether the quoted £240 includes follow-up. In most clinics these questions funnel to a receptionist who answers them one at a time, in work hours, with a phone and a diary.

A patient concierge solves the obvious problem: answer predictable questions instantly, collect enquiries outside hours, route to a human when the conversation needs one, convert browsers into bookings. Clinics know this. What stops them reaching for off-the-shelf chat tooling is that every generic tool is designed around the assumption that worst-case misinformation costs you a return or a refund. In healthcare the worst case is different.

So the interesting design question is not “can we make an LLM answer these questions”. It is “how do we give the LLM enough context to be useful, strict enough constraints to stay safe, and a clean enough escalation path that anything it cannot handle goes to a human fast”.

2. What “safe” actually means here

Safe in this domain is three overlapping constraints, and they all push in different directions.

Clinical accuracy. The system must never diagnose, must never recommend treatment, must never contradict a clinician, and must never give the impression it is doing any of those things. This rules out the default behaviour of most LLMs, which will happily answer “what could this symptom mean” with a confident list.

Data protection. Any data that could identify a patient is special category data under UK GDPR. The concierge cannot store messages containing personal health information in a way that is visible to the engineering team by default, and it cannot persist them beyond what is operationally necessary for the clinic to follow up.

Tone. The tone needs to sound like the clinic. Not like a friendly Silicon Valley chatbot, not like a hospital information line. Each clinic has its own register. A London Harley Street practice and a regional physio studio have different voices, and the concierge has to speak them both without feeling off.

These constraints interact. The right answer to a medical question is often to not answer and escalate, which reads as unhelpful if the tone is not warm. The right way to handle PII is to discard it quickly, which reads as forgetful if the concierge cannot refer back to what the patient said a message ago. Getting the balance right is the whole job.

3. Multi-tenancy: one installation, many clinics

MediConcierge serves many clinics from one codebase. Each clinic gets its own configured instance: concierge branding, service menu, availability, FAQs, tone profile, plus the admin tooling their staff use to run the practice day-to-day. Zero engineering effort per clinic, ideally.

The obvious multi-tenancy choice in Postgres is one of three patterns: database-per-tenant, schema-per-tenant, or shared schema with tenant_id. We went with shared schema, enforced at the application layer with strict tenant scoping and at the database layer with row-level security policies on Neon Postgres.

Why not schema-per-tenant? Because clinic count grows unpredictably. Spinning up a new schema per onboarding is a migration footgun: every schema change becomes N schema changes, and the cost of being wrong scales with the tenant count. Shared schema with tenant_id keeps the migration story simple and lets us reason about one table instead of many.

Why row-level security on top? Belt and braces. A bug in the application that accidentally selects across tenants becomes a data breach. RLS policies mean the database refuses cross-tenant reads even if the application asks for them. It is defense in depth, and it costs very little.

The trade-off we are living with is that every analytical query has to be tenant-scoped, which makes cross-clinic reporting harder than it would be in a dedicated warehouse. We will either add a read replica with broader permissions or, more likely, stream aggregates out to a separate analytics store when clinic count justifies it.

4. Tone calibration for medical context

The generic LLM default tone is wrong for this context in specific ways. It over-apologises. It hedges everything. It volunteers information that a receptionist would not. Worst of all, it often shifts into a teaching register when asked a medical question, which is exactly the register that reads as “the chatbot is trying to diagnose me”.

We handle this with three layers.

System prompt constraints tell the model what it is and is not for. A MediConcierge system prompt starts by naming the clinic, describing the services offered, and listing what the concierge must route to a human. It explicitly forbids medical opinions, diagnoses, prescriptions, and contradicting a clinician. This is not sufficient on its own, but it is the baseline.

Refusal patterns handle the specific failure modes we have seen. The concierge is given worked examples of correct refusals for questions like “what could this rash be”, “is it safe to combine X and Y”, “should I go to A&E”. Each refusal is warm, not dismissive, and each one has a specific next step: “I’m not the right person to answer that, but I can book you a same-day consultation and our doctor will call you back within the hour. Would you like me to do that now?”.

Per-clinic tone overrides let each clinic provide a short tone description when they onboard (“warm but formal”, “friendly and efficient”, “reassuring, avoid clinical jargon”). This gets injected into the prompt. It is crude, but surprisingly effective. Clinics can feel the difference between the version trained on their tone description and the generic one, and prospective patients respond to the concierge more like a receptionist and less like a chatbot.

We considered fine-tuning a per-clinic model and did not do it. The tuning cost per clinic would be significant, the tone description approach gets 90% of the way there, and the engineering effort is better spent on the next two sections.

5. Stripe billing alongside patient data

Clinics pay MediConcierge on a subscription. Stripe is the obvious choice. The constraint is that Stripe sits on one side of a very hard line, and patient data sits on the other.

We solved this by treating Stripe as an entirely separate concern from clinical data. The billing tables live in their own logical region of the database, never reference patient data, and only know about clinics (the tenants) as billing accounts. When a clinic churns, the billing tables note the cancellation. The patient-conversation tables do not need to know. When a clinic’s subscription lapses, the concierge stops responding to new sessions but existing conversation data remains under the same GDPR lifecycle as before.

The Stripe webhook handler is also a choke point for reliability. Webhooks can arrive twice, arrive out of order, or arrive hours late. Each webhook carries an event ID; we persist that ID on first receipt and treat subsequent deliveries of the same ID as idempotent. Subscription state on our side is only ever derived from the most recent webhook event with the highest created_at. This handles the surprisingly common case where Stripe retries a webhook an hour after an updated webhook has already been processed.

6. What we would do differently

Two honest reflections.

First, we over-invested in a custom admin dashboard when clinics would have been fine with a simpler interface in the first six months. A lot of engineering went into configurability that the pilot clinics never touched. The lesson is the usual one: build the minimum configurable surface, see what clinics actually change, then invest in those settings. We will recoup the admin dashboard investment later as we add features, but on a zero-based review we would have done less of it earlier.

Second, we underestimated the operational side of tenant onboarding. Getting a clinic from “signed contract” to “concierge live on their website” involves collecting their service menu, their pricing, their availability, their tone preferences, their FAQ answers, their escalation contacts, their branding, and their test users. That is a lot of structured information to extract from a busy clinic manager. We built the onboarding flow as a form-based wizard and it still takes too long. If we were starting over we would invest earlier in an onboarding conversation where the concierge itself helps populate its own configuration. There is a recursive elegance to “the concierge onboards the clinic to the concierge”, and it would cut onboarding time significantly.

7. What’s next

Three directions we are exploring.

Voice interface. Patients on mobile would often prefer to talk than type, and clinic receptionists already do a version of this workflow by phone. Adding a voice mode using a streaming TTS/STT pipeline is technically straightforward. The interesting design question is whether voice should be the primary mode for some clinic contexts (for example hands-busy caregivers) rather than a fallback.

Multilingual. London private healthcare in particular has significant inbound from non-English-first speakers. The model is perfectly capable of handling other languages. The harder problem is preserving the clinic’s tone in every language. Translation alone is not enough; we need tone profiles per language.

PMS integration. The booking half of the concierge is currently a lightweight calendar on top of availability data the clinic manually maintains. The higher-value integration is into the clinic’s existing practice management system so the concierge can read and write bookings live. PMS APIs in this sector are patchy, which is why we have not built this yet. What is shifting is that more PMS vendors are opening proper APIs under pressure from clinics that want this kind of integration.

Closing note

The pattern we have found repeatedly is that AI for healthcare is a problem in a healthcare constraint space with an AI tool inside it, not the other way around. The constraints come first, the model behaviour is shaped around them, and the interesting engineering work is in the tone calibration, tenant isolation, and escalation logic. The LLM is often the cheapest part of the system.

The patient concierge is one piece of MediConcierge. The broader platform is built around a single commercial mechanism: returning labour hours to the clinic. Reception teams spend a large fraction of their week on repeat enquiries, appointment reshuffling, no-show chasing, and follow-up admin that is not clinical work. The concierge absorbs the predictable enquiries. The CRM and practice insights help staff prioritise what is left. The scheduling tools cut down on no-shows and wasted slots. The measurable outcome we care about is admin hours reclaimed and appointments actually booked, not chatbot conversations completed. The rest of the platform will get its own research pieces.

If you are building something in an adjacent problem space and any of this would be useful to compare notes on, we are always happy to talk.