AstraZeneca·2026·Agentic AI · Clinical trials

Designing clinical trials with an agent in the loop

A cohort builder where natural language, structured criteria, and AI agents work in sync. From idea to validated cohort in minutes, not days.

Product designAI / agentic UXDesign + build

01The problem

Designing a clinical trial starts with building a patient cohort from enormous, messy clinical datasets. At AstraZeneca the users are clinical scientists and feasibility researchers, and the stakes are high: a cohort that is too narrow kills a trial's statistical power, one that is too broad wastes months and budget. The work was fragmented and slow. Researchers translated clinical intent into rigid queries, handed them to specialized data teams, waited days for validation, then cycled back through disconnected systems when the numbers came back wrong.

This is workflow-heavy enterprise work: multi-step cohort construction across demographics, temporal windows, comorbidities, and exclusions, each with its own edge cases and each dependent on standardized vocabularies. Hand it entirely to an AI and you lose the clinician's judgment — the one thing you cannot lose in a trial. The real question was how to keep the human in control while the machine does the heavy lifting.

02My role

I was the sole product designer and drove the platform's UX end to end: the problem framing, the research, the human-in-the-loop model, and the shipped v1. The core decision I owned was the boundary — where the AI agents act autonomously versus where the clinician must confirm — and I designed every surface that makes that boundary legible in the product. I partnered daily with the PM and the ML engineers to turn that boundary into something real under the platform's actual cost and latency constraints.

03The approach

I ran interviews with clinical scientists and feasibility researchers, and shadowed two live cohort-scoping sessions. The recurring signal was that researchers did not distrust automation — they distrusted automation they couldn't see. When an early concept auto-expanded a SNOMED concept set and silently pulled in thousands of extra patients, one researcher told me she would rebuild the whole query by hand rather than trust a number she couldn't trace. That killed the idea of a fully conversational, hands-off builder and pushed me to a model where the AI proposes and the human confirms, with every concept expansion shown and reversible before it affects the count.

So I scoped v1 deliberately. I cut the free-form, chat-only interface and instead paired natural language with a structured criteria panel, so intent could be expressed either way and always resolved to auditable OMOP and SNOMED concepts. I also designed the system states the workflow actually hits: loading while agents validate, empty states for the dataset marketplace, and clear error and warning states for data-coverage gaps, concept-expansion risk, and attrition impact surfaced in real time. Agent reasoning is legible in context, keyboard-navigable, and screen-reader accessible, with status conveyed by text and icon rather than color alone.

04What I built

I shipped an AI-assisted cohort builder that accepts conversational and structured input, translates clinical intent into standardized OMOP and SNOMED concepts, resolves ambiguities, and guides validation with real-time feedback on coverage, expansion risk, and attrition. Around it I designed the human-in-the-loop review patterns, a specialized agent architecture spanning cohort construction, data-quality validation, feasibility estimation, and evidence generation, and a federated dataset marketplace and agent catalog for discovering and applying datasets inside the workflow.

Working with engineering, we negotiated scope around a hard constraint: agent validation runs are expensive and not instant, so I could not treat every keystroke as a live query. I designed for that by batching criteria into explicit "validate" steps with optimistic previews and a clear pending state, and gave every agent full audit trails so a clinician can see, and reverse, exactly what it did. That single constraint shaped the whole rhythm of the interface.

05Outcome

V1 collapsed the idea-to-validated-cohort cycle from days of ticketing and back-and-forth to a single self-serve session, and researchers stopped filing tickets to the central data team for routine scoping — they could do it themselves and trust the result.

The clearest post-launch lesson came from watching real use. In v1, agents surfaced their reasoning but researchers still hesitated at the moment of committing a cohort, unsure which criterion was driving a sharp attrition drop. For v2 I added an inline "why did this change" breakdown on the count itself, attributing each drop to the specific criterion responsible. After that, reviewers moved through validation noticeably faster and the back-and-forth with the data team on ambiguous cohorts dropped.

06Reflectionoptional

The batched-validation trade-off was right for cost and trust, but the pending states made the tool feel slower than researchers expected from a "live" AI product. If I did it again I'd invest earlier in fast, cheap client-side estimates for the common criteria so the interface feels instant, and reserve the expensive agent runs for the moments that genuinely need them.

Interfaces