Building Hipócrates: Lessons from a Medical AI System That Thinks Like a Doctor

When I started developing Sistema Hipócrates, I had a simple question: why are most medical chatbots so... superficial? You type "patient with pancreas pain" and get a generic list of pancreatitis causes. But any experienced doctor would pause and ask: "Did the patient actually say 'pancreas pain'? Patients usually don't know anatomy. Where exactly is this pain?"

That question changed everything.

The Problem with "Yes, Sir" Chatbots

Most healthcare AI implementations suffer from a fundamental problem: they're excessively accommodating. They accept any input as absolute truth and rush to give answers. It's like a first-year resident trying to impress — responds quickly, but without depth.

A useful medical AI system needs to do what good clinicians do: question assumptions, investigate inconsistencies, and build reasoning before concluding.

Hipócrates and Florence Were Born

I created two assistants with distinct personalities:

Hipócrates (for doctors) — Inspired by the father of medicine, uses Socratic reasoning. When you say "patient with type 1 diabetes on metformin," he responds: "There's an inconsistency here. Metformin is typically used for T2DM, not T1DM. The patient might be confusing the type or the medication. Can you confirm?"

Florence (for nurses) — Inspired by Florence Nightingale, focuses on practical care and structured documentation using NANDA-I, NIC, and NOC taxonomies.

Technical Decisions That Matter

1. Language-Based Routing

I discovered that LLMs trained on Brazilian Portuguese (like Maritaca's Sabiá) understand nuances that global models miss. "Dor de barriga" isn't the same as "abdominal pain" — it carries cultural context. So I implemented automatic routing: Portuguese goes to Maritaca, other languages to Claude.

2. Streaming with ID Synchronization

An interesting challenge: how to enable feedback on messages still being generated? The solution was to send the message UUID at the end of the stream, allowing the frontend to associate feedback with the correct database message.

```json

{ "content": "response chunk..." }

{ "done": true, "message_id": "database-uuid" }

```

3. Prompts That Teach How to Think

The secret isn't giving the model more information, but structuring how it should reason:

Before diagnosing, ask about location, radiation, intensity
Don't accept ready-made diagnoses — ask for evidence
Identify inconsistencies between symptoms and medications
Suggest, never prescribe

The Value of Feedback in Medical AI

We implemented a feedback system (thumbs up/down + comments) per message. Seems simple, but the insights are profound: doctors don't want long answers — they want correct answers. Messages with the most negative feedback had something in common: they were too verbose.

The metrics dashboard showed us that satisfaction rates increased significantly when we shortened responses and increased clarifying questions.

Lessons That Transcend Medicine

1. Useful AI questions, doesn't just answer — In any specialized domain, blindly accepting inputs is dangerous.

2. Cultural context matters — Training on local data isn't a luxury, it's a necessity.

3. Granular feedback > general feedback — Knowing "the answer was bad" doesn't help. Knowing "the answer about drug interactions was imprecise" allows improvement.

4. Explicit limitations build trust — Hipócrates always warns: "I don't replace in-person evaluation. I recommend confirmation with tests." Paradoxically, this increased user trust.

The Future

We're working on integration with Brazil's ANVISA drug database for automatic drug interaction checking, ICD-10 suggestions based on symptoms, and a teaching mode for residents.

But the biggest lesson remains: the best medical AI isn't the one that knows more — it's the one that knows how to ask better questions.

What about you? Have you thought about how your AI systems could be more questioning?