London

June 28–29, 2027

New York

September 15–16, 2026

Berlin

November 9–10, 2026

Finding the 80/20: Lessons from delivering our first LLM feature

How to choose what actually matters when shipping your first LLM feature, balancing evaluation, trust, and delivery without overengineering.

Speakers: Thordis Thorsteins

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

Register with google

We have linked your account and just need a few more details to complete your registration:

Terms and conditions

 

 

Enter your email address to reset your password.

 

A link has been emailed to you - check your inbox.



Don't have an account? Click here to register
June 02, 2026

Shipping your first LLM feature requires more than code. This talk explains how we navigated new strategic demands, from opt-in stances to evaluation workflows, to find a pragmatic path to production.

LDX3 New York lineup

Even for mature engineering organisations, the first LLM-powered feature introduces a paradigm shift. The actual coding often feels like the easy part. The real challenge lies in a daunting list of new “supporting” activities. From defining evaluation frameworks and prompt versioning to deciding on customer “opt-out” stances and communication strategies, how do you decide where to focus your limited energy?

In this session, I will share the “battle scars” from our journey to production. I’ll break down our approach to the 80/20 rule: identifying the activities that required deep investment, those we found a “light” way to handle, and those we intentionally parked for later.

We will explore:

  • The Effort Flip: How we restructured our workflow to move from “getting it to work” to “ensuring it’s good enough,” reallocating engineering time from writing code to context engineering and evaluation.
  • Strategic Stances: How we tackled non-code needs like opt-in/opt-out policies, customer FAQs, and internal upskilling.
  • Triage in Action: What we over-invested in, what I wish we’d done sooner (like early internal documentation on feature capabilities) and what we successfully deferred (like automation of evals).

This isn’t a prescriptive guide, but a look behind the curtain at how we defined our own “production-ready” standard. You will leave with a framework for auditing your own AI roadmap to ensure you’re focused on the activities that truly drive reliability and trust.

Key takeaways

  • Evaluation First: Robust evaluation matters more for reliable AI systems than perfecting the initial code.
  • Prioritising based on a North Star: We treated security and trust as non-negotiables, informing what to prioritise, including OWASP Top 10 reviews and data privacy decisions.
  • Design the First Feature to Scale: The initial rollout can establish reusable patterns that accelerate every AI feature that follows.
  • Manual Before Automated: Starting with human-in-the-loop evaluation saved time and helped us understand failure modes before investing in automated testing.