Building Conversational AI: Lessons From Shipping a Legal Chatbot

We built Lexa, Pakistan's first AI legal chatbot, to put first-pass legal guidance in front of anyone with a browser. It's live, it answers real questions about real law, and building it taught us more about shipping production AI than any amount of prototyping could. Here's what actually mattered.

Legal help is an access problem, not a knowledge problem

The law isn't secret — it's just inaccessible. Most people never speak to a lawyer until a problem has already become a crisis, because the first conversation is expensive, slow, and intimidating. The opportunity wasn't to replace lawyers. It was to give ordinary people a clear, sourced first answer to "what does the law actually say about this?" — the answer that usually decides whether someone acts early or waits until it's too late.

That framing shaped every engineering decision. The product isn't "a chatbot that knows law." It's "a way for a non-lawyer to get trustworthy guidance, fast, without a barrier in front of them."

A generic chatbot would have been worse than nothing

The fastest way to build a "legal chatbot" is to wrap a large language model in a chat box and let it answer from memory. We didn't do that, and on a high-stakes domain you shouldn't either.

An ungrounded model will answer a legal question with total confidence and no basis — inventing sections, misremembering statutes, blending jurisdictions. In most products a hallucination is an annoyance. In law, a confident wrong answer is a liability. The entire build was organised around making that failure mode rare and visible.

Lesson 1: Grounding beats a bigger model

The single most important decision was retrieval-augmented generation (RAG). Instead of asking the model what it "remembers," we retrieve the relevant legal text first and ask it to answer from that retrieved context — and to surface what it's drawing on.

This is the difference between a tool and a toy. A bigger or newer model doesn't fix hallucination on a specialist corpus; grounding does. If you take one thing from this post: for any domain where being wrong has consequences, your effort belongs in the retrieval pipeline — chunking, embeddings, ranking — far more than in model choice.

Lesson 2: The guardrails are the product

A legal assistant has to know its limits. We spent real engineering time on:

Scoping — keeping answers inside what the system can actually support.
Safe refusal — declining, gracefully, when a question needs a human lawyer rather than guessing.
Traceability — answers that point back to their basis, so a user (or a lawyer) can check them.

These aren't features you bolt on at the end. They're the difference between something you can responsibly put in front of the public and something you can only demo.

Lesson 3: Design the conversation for a non-expert

Lexa's users aren't lawyers. That sounds obvious, but it changes everything: plain language instead of jargon, a calm and patient tone, and answers structured so someone under stress can actually follow them. A technically correct answer that a frightened person can't parse has failed. Conversation design did as much work here as the model did.

Lesson 4: Public means performance and access

Putting AI in front of the general public is its own discipline. No signup wall between a person and their first answer. Fast responses on ordinary connections and devices. Graceful behaviour under load. A clean, fast Next.js front end did the unglamorous job of making the assistant feel instant and trustworthy — which, for a first-time user deciding whether to trust it at all, is most of the battle.

What we'd tell anyone building a domain-specific chatbot

Ground everything. RAG isn't optional when accuracy matters.
Make refusal first-class. Knowing when not to answer is a feature, not a gap.
Measure quality against real questions and edge cases, not vibes.
Design for your actual user, who is almost never an expert in your domain.

This is the same approach we bring to every chatbot we build and every AI integration: grounded, guarded, measured, and shipped to production rather than parked in a demo. If you're thinking about a domain-specific assistant of your own, book a call — we'll tell you honestly what's worth building.

Building Conversational AI: Lessons From Shipping a Legal Chatbot

Legal help is an access problem, not a knowledge problem

A generic chatbot would have been worse than nothing

Lesson 1: Grounding beats a bigger model

Lesson 2: The guardrails are the product

Lesson 3: Design the conversation for a non-expert

Lesson 4: Public means performance and access

What we'd tell anyone building a domain-specific chatbot

Need help building this?

Related posts

Building Production AI Agents: Architecture, Evals, and What Actually Ships

What is MCP (Model Context Protocol)? A Practical Guide for Product Teams

How to Evaluate LLM Outputs in Production (Eval Pipelines That Actually Work)