Finding the 80/20: Lessons from delivering our first LLM feature

How to choose what actually matters when shipping your first LLM feature, balancing evaluation, trust, and delivery without overengineering.

Speakers: Thordis Thorsteins

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

June 02, 2026

Shipping your first LLM feature requires more than code. This talk explains how we navigated new strategic demands, from opt-in stances to evaluation workflows, to find a pragmatic path to production.

New York • September 15 & 16, 2026

The pace of change keeps accelerating. See what other leaders are doing about it, at LDX3 New York.

Explore

Even for mature engineering organisations, the first LLM-powered feature introduces a paradigm shift. The actual coding often feels like the easy part. The real challenge lies in a daunting list of new “supporting” activities. From defining evaluation frameworks and prompt versioning to deciding on customer “opt-out” stances and communication strategies, how do you decide where to focus your limited energy?

In this session, I will share the “battle scars” from our journey to production. I’ll break down our approach to the 80/20 rule: identifying the activities that required deep investment, those we found a “light” way to handle, and those we intentionally parked for later.

We will explore:

The Effort Flip: How we restructured our workflow to move from “getting it to work” to “ensuring it’s good enough,” reallocating engineering time from writing code to context engineering and evaluation.
Strategic Stances: How we tackled non-code needs like opt-in/opt-out policies, customer FAQs, and internal upskilling.
Triage in Action: What we over-invested in, what I wish we’d done sooner (like early internal documentation on feature capabilities) and what we successfully deferred (like automation of evals).

This isn’t a prescriptive guide, but a look behind the curtain at how we defined our own “production-ready” standard. You will leave with a framework for auditing your own AI roadmap to ensure you’re focused on the activities that truly drive reliability and trust.

Key takeaways

Evaluation First: Robust evaluation matters more for reliable AI systems than perfecting the initial code.
Prioritising based on a North Star: We treated security and trust as non-negotiables, informing what to prioritise, including OWASP Top 10 reviews and data privacy decisions.
Design the First Feature to Scale: The initial rollout can establish reusable patterns that accelerate every AI feature that follows.
Manual Before Automated: Starting with human-in-the-loop evaluation saved time and helped us understand failure modes before investing in automated testing.

Slides