A Practical Guide to Agile Part 4: Practices and Measurement — From XP to Velocity · DORA

A Practical Guide to Agile Part 4: Practices and Measurement — From XP to Velocity · DORA


Introduction

If everything through the previous post was “what frame do you work in” (Scrum, Kanban), Part 4 takes on two questions that play out inside that frame.

First, how do you actually build the code in there? Run every Scrum event you like — if you can’t change the code safely and fast, the short iterative loop itself stalls. What owns this “how you build” is XP’s engineering practices.

Second, how do you measure it? Numbers like story points, velocity, and DORA are useful, but the moment a number becomes a target, the team starts gaming it.

Two warnings run through both questions. Adopt the process but skip the engineering practices, and Agile collapses (Section 1). And when measurement becomes a KPI, the measurement itself breaks (Section 4). Both are core causes of the fake Agile in Part 5.

The target reader is a team that runs Scrum but is increasingly afraid to touch the code, or someone pressured to use velocity as a productivity metric.


TL;DR

  • Extreme Programming is the technical heart of Agile — engineering practices like TDD, CI, refactoring, and pairing keep code “in a state you can keep changing safely.” It’s the implementation of Part 1’s principle 9 (technical excellence enhances agility).
  • Adopt the process but skip the practices, and it collapses — run only Scrum events and skip the practices, and technical debt piles up, change slows, and the short iterative loop itself stalls. The #1 cause of fake Agile.
  • Story points are relative size, not time — humans are weak at absolute time estimates but better at relative comparison. You estimate together with planning poker.
  • Velocity is a planning tool, not a performance target — use it to compare teams or as a productivity target and, by Goodhart’s Law, it breaks through point inflation.
  • To measure, look at DevOps delivery-performance metrics (deploy frequency, change-fail rate, recovery time) — deployment frequency, change lead time, change failure rate, time to restore. Measure delivery outcomes, not an internal score (velocity). Velocity ≠ value.

1. XP — The Technical Heart of Agile

1.1 What XP Is

XP (eXtreme Programming), distilled by Kent Beck in the late 1990s, is the idea of “pushing good development practices to the extreme.” If code review is good, do it continuously (pair programming); if testing is good, write it before the code (TDD); if integration is good, do it many times a day (continuous integration).

XP has five values (communication, simplicity, feedback, courage, respect), but what sets it apart from other frameworks is that it tackles concrete engineering practices head-on. If Scrum and Kanban are “how you organize the work,” XP is “how you build the code.”

1.2 Why Engineering Practices Are the Crux

Part 1’s principle 9 said “continuous attention to technical excellence and good design enhances agility.” That’s not decoration — it’s a condition for Agile to work.

The core of Agile is changing often in short iterations. But to change often, the code must stay in a changeable state. If there are no tests so change is scary, and the structure is so tangled that fixing one place breaks another, then “change often” becomes impossible.

So process and practices are one body. What happens if you adopt only Scrum events (process) and skip TDD, CI, and refactoring (practices)?

  • Technical debt piles up and change gets slower and slower.
  • As change slows, velocity drops.
  • Unable to change code safely, the short iterative loop stalls.

You end up with events running but nothing agile about it. This is the #1 path by which Agile collapses, and we meet it again in Part 5 as the core symptom of fake Agile. The process is the skeleton; engineering practices are the muscle that moves it.


2. The Core Engineering Practices

Here are the six most widely used XP practices today.

PracticeWhatWhy it matters
TDDWrite a failing test first, make it pass, then refactorMakes change safe and drives the design
Continuous Integration (CI)Integrate small changes into the mainline many times a dayAvoids integration hell, fast feedback
RefactoringImprove internal structure without changing behaviorKeeps code in a state you can keep changing
Pair programmingTwo people write one piece of code togetherContinuous review, knowledge sharing, no silos
Simple design (YAGNI, You Aren’t Gonna Need It)Build only what’s needed nowPrevents complexity from piling up (Part 1 principle 10)
Collective code ownershipAnyone can change any codeRemoves bottlenecks and silos

2.1 TDD — Tests Drive the Design

TDD (Test-Driven Development) writes the test before the code. It repeats three short steps, Red-Green-Refactor.

flowchart LR
    R["Red<br/>write a failing test"] --> G["Green<br/>minimum code to pass"]
    G --> RF["Refactor<br/>improve structure (behavior intact)"]
    RF -->|"next small step"| R

The crux is that the tests become a regression safety net that makes refactoring possible. With tests behind you, you can improve structure fearlessly, so the code stays “in a state you can keep changing.” TDD also dovetails with Part 1’s principle 10 (simplicity) — you write only the minimum code to pass the test.

2.2 CI and Trunk-Based Development

Continuous Integration (CI) is the practice of integrating small changes into a shared mainline often (many times a day) and catching problems immediately with an automated build and tests on every integration.

On the opposite side is the long-lived branch. Work separately for weeks, then merge all at once, and conflicts and integration bugs explode at that point (integration hell). So CI usually goes hand in hand with trunk-based development — merging to the mainline frequently via short-lived branches.

These practices connect directly to measurement. The more a team integrates and deploys often and small, the better its DORA metrics (deployment frequency, change lead time) from Section 4. Engineering practices show up as delivery performance.


3. Estimation — Story Points and Planning Poker

3.1 Why Story Points Aren’t Time

The most common misconception is trying to convert “1 story point = N hours.” Story points are not time but relative size — a “sense of size” of one chunk, combining complexity, uncertainty, and effort.

Why relative size and not time? Because humans are terrible at absolute time estimates (“this’ll take a few days”) but much better at relative comparison (“this is about twice that”). So you fix one reference item and size the rest against it.

3.2 Planning Poker and Fibonacci

Planning poker is a technique for estimating together as a team. Each person reveals a size card simultaneously, and when values diverge, you discuss why. More valuable than the estimate itself is the difference in understanding the discussion exposes.

Cards usually use a Fibonacci-like sequence (1, 2, 3, 5, 8, 13, …). The gaps widen as numbers grow, reflecting that larger work carries more uncertainty, making fine distinctions meaningless.

Note — #NoEstimates: a movement that sees estimation as often wasteful. Not “never estimate,” but closer to “slice work small and uniform and forecast from throughput (count of completed items).” It’s the same idea as forecasting from Part 3’s flow metrics (throughput, lead time). Read it as a challenge to ask whether detailed estimates really add value.


4. Measurement — The Velocity Trap and DORA

4.1 The Right and Wrong Uses of Velocity

Velocity is the sum of story points completed in a sprint. The problem is where you use it.

UseWhere
Right useForecasting the team’s own capacity for the next sprint (a planning tool)
Wrong useComparing teams, a productivity KPI, a management target

Because each team estimates on a different scale, comparing velocity across teams is inherently impossible — one team’s 8 may be another’s 3. It’s only meaningful within the same team, to gauge “how much can we take on next.”

4.2 Goodhart’s Law — Measurement Breaks When It Becomes a Target

Goodhart’s Law warns that “when a measure becomes a target, it ceases to be a good measure.” Make velocity a KPI and exactly this happens.

flowchart LR
    M["measure velocity"] --> T["velocity becomes a target<br/>(KPI · pressure)"]
    T --> G["point inflation · quality sacrificed<br/>(gaming)"]
    G --> B["the number rises but<br/>value stays flat or drops"]
    B -->|"the measure loses trust"| M

Once it’s a target, the team assigns more points to the same work (point inflation) and skips “no-points work” like refactoring and testing. The number rises but real value stays flat or falls. Part 1’s principle 7 (“the measure of progress is working software”) rings here again — measure progress by working results, not by points.

Note — burndown and burnup: a burndown chart draws remaining work down to zero; a burnup chart stacks completed work up toward a scope line. The burnup also shows scope growth, so it distinguishes “why aren’t we done” between not working enough and the scope having grown.

4.3 DORA — Four Metrics for Delivery Performance

If velocity is an internal score, the DORA (DevOps Research and Assessment) four metrics measure delivery outcomes. Distilled in the Accelerate research, they represent software delivery performance.

MetricWhat it measuresAxis
Deployment frequencyHow often you deploy to productionSpeed
Change lead timeTime from commit to running in productionSpeed
Change failure rateShare of deployments that cause a failureStability
Time to restore (MTTR, Mean Time To Restore)Time to recover from a failureStability

The first two are speed, the last two stability. The key finding is that speed and stability don’t trade off — high performers (elite) do well on both. Deploying often and small makes each change small, so failures are fewer and recovery faster. Section 2’s CI and trunk-based development are exactly the practices that lift these four metrics.

DORA is more honest than velocity because it measures actual delivered results, not internal estimates. “Change lead time” is the same kind of measure as Part 3’s flow lead time. Still, even DORA, nailed down as a KPI and forced as a target, can’t escape Goodhart’s Law. Measurement should be a signal the team uses to improve itself, not a target imposed from above.


Recap

The essentials of Part 4, one line each:

  • XP’s engineering practices are the technical heart of Agile — TDD, CI, refactoring, and pairing keep code “in a state you can keep changing safely.” This is the substance of Part 1’s principle 9 (technical excellence).
  • Adopt the process but skip the practices, and it collapses — technical debt slows change and the short iterative loop stalls. The #1 cause of fake Agile.
  • Story points are relative size; velocity is a planning tool — converting to time, or comparing teams, was never the point.
  • When measurement becomes a KPI, Goodhart’s Law breaks it — point inflation and sacrificed quality follow. Measurement should be a signal, not a target.
  • To measure, look at DORA — deployment frequency, change lead time, change failure rate, time to restore. Measure delivery outcomes, not an internal score; speed and stability don’t trade off.

Part 5 is the destination of the series — Scaling and Fake Agile. We’ll synthesize what gets hard when you scale Agile beyond one team to many (SAFe, LeSS, Spotify, Conway’s Law), how the decay signals from Parts 1–4 harden into “fake Agile” in a real team, and how to recover from there.


Appendix

A. Glossary

TermOne-line definition
XP (eXtreme Programming)An Agile methodology that pushes good development practices to the extreme (Kent Beck)
TDD (Test-Driven Development)Write a failing test first (Red), make it pass (Green), then improve structure (Refactor)
Continuous Integration (CI)Integrating small changes into the mainline often, validated by automated build and tests each time
Trunk-based developmentMerging to the mainline frequently via short-lived branches (opposite of long-lived branches)
RefactoringImproving the internal structure of code without changing its behavior
Pair programmingTwo people writing one piece of code together, reviewing continuously
Simple design / YAGNIBuild only what’s needed now (You Aren’t Gonna Need It, Part 1 principle 10)
Story pointRelative size combining complexity, uncertainty, and effort — not time
Planning pokerA technique where the team reveals size cards simultaneously and discusses the differences
VelocityThe sum of story points completed in a sprint — the team’s planning tool, not a productivity KPI
Goodhart’s LawThe warning that “when a measure becomes a target, it ceases to be a good measure”
#NoEstimatesA movement that doubts the value of detailed estimates and forecasts from throughput instead
DORA (DevOps Research and Assessment) four metricsDeployment frequency, change lead time, change failure rate, time to restore — software delivery performance
MTTRMean Time To Restore — the time to recover from a failure

B. External References

Shop on Amazon

As an Amazon Associate, I earn from qualifying purchases.