Optimizing DORA Metrics to Drive Exponential Gains in Business Outcomes (Part 1 of 4)
It has been thoroughly documented that companies which perform well in terms of velocity and quality of their software development are also highly correlated with companies that lead in the marketplace across many business indicators.
Today at Checkr, we’re putting this to the test.
Our hypothesis is that investing in engineering goals based on industry standard metrics as defined by the DORA group (DevOps Research and Assessment) will enable Checkr as a whole to achieve our business objectives this year.
With a hypothesis in hand, the question becomes: What experiments should we run to test this hypothesis? When defining implementation strategies in pursuit of ambitious business goals we want to be as methodical as possible to enable data-driven decision making. The rest of this post will explain the metrics we will be measuring, our approach for benchmarking, and our roadmap for experimentation.
While there are published correlations between DORA metrics, organizations demonstrating a generative organizational culture, and business performance, we hope to take it one step further and measure how DORA metrics can materially impact specific top-down business metrics. This is where we depart from the broad survey-based metrics that other well-published studies have used.
Company Key Metrics
In 2022, we’ve defined three ambitious top-level business goals for our company: Create a Delightful Experience for our Customers and their Candidates, Scale to X Customers, and perhaps most importantly, Make Checkr the Best Place to Work. This is all in conjunction with a broad expansion of our product into self-service capabilities, international offerings, and a growing ecosystem of channels and partners.
Create a Delightful Experience for our Customers and their Candidates
• Customer NPS (Net Promoter Score)
• Candidates Unblocked
Scale to X Customers
• Uptime %
• 90 TAT (time from start to finish for a background check)
• Number of Active Customers
Make Checkr the Best Place to Work
• Employee Engagement Score
Engineering Key Metrics
Industry standard metrics defined by the DORA group (DevOps Research and Assessment) will help us to quantitatively measure and better assess our current performance, as well as historical trends over time.
- Deployment Frequency: How often an organization successfully releases to production (annually, quarterly, monthly, weekly, daily, hourly, etc)
- Cycle Time for Software Change: The amount of time it takes a change request to get into production (from branch creation to successful deployment)
- MTTR (Mean Time To Resolution): How long it takes an organization to recover from a failure in production (from incident start to incident resolution)
- Change Fail Rate: The percentage of deployments causing a failure in production (where failure is defined as the need for a remediation in the form of a bugfix, a hotfix or a revert)
So where does Checkr fall in terms of industry standards? After collecting the relevant metrics for benchmarking (via Haystack), we were, unsurprisingly, somewhere in the middle. In defining where we wanted to be at the end of 2022, we also included relative metrics of success in addition to best in class metrics.
• Adopting a standard set of industry metrics for quality and velocity across all engineering teams at Checkr (DORA metrics)
• Benchmark our current status
• Migration from Monolith to Non-Monolith (Continued investment in refactoring and building outside of the monolith)
• Software development process investments (Scrum of Scrums DORA metrics weekly review for increased accountability, roadmap form factor standardization, release plan form factor standardization)
• Software development lifecycle investments (Test data capabilities via automated creation of state snapshots, test performance via investments for parallelization and determinism, Checkr environments via on demand seamless creation of envs for dev, test, demo and more, and real continuous delivery)
• Experiment Tracking and Review
• Experiment Conclusion
Over the past 12 months, our trends have been mixed (see Appendix below). Quality in the monolith is the biggest area for improvement. We ended this last cycle trending upwards to 61% change failure rate within the monolith, which is concerning. We are celebrating our improvements achieved in mean time to resolution. MTTR trendline went from just over 30 hours down to just over 10 hours, dropping by 66%. Throughput is gaining in non-monolith code and dropping in monolith code as we refactor and target feature development outside of our legacy software systems into our new software systems (average 56 PRs per week in monolith vs 300 PRs per week in non-monolith). Deployment frequency and cycle time are currently moving in the wrong directions over the past 12 months (deployments are less frequent and cycle time is longer), however we are excited to see trajectories change with our work this year.
In this post we shared our ambitious business goals and foundational engineering goals: doubling (or more) improvements in quality and velocity of our software development process. We shared our hypothesis for an implementation strategy that would enable us to reach our goals. We shared the key external company metrics and the key internal engineering metrics we will use to measure our success, and we shared our current status with trendlines over the past 12 months across our entire code base, specifically within our legacy monolith and our next-generation non-monolith source code.
We are excited to move forward at Checkr in 2022 to see where this journey takes us, and what new and innovative things we will learn along the way. We are also eager to report back to the engineering community with progress as we go.
This post is the first of a four part blog series. In this initial post, we define the experiment, benchmark our current status and share next steps through the remainder of 2022. In part two, we will cover the software development process and lifecycle improvements underway. In part three, we will provide an update on this experiment, sharing our progress. In part four, we will publish the experiment conclusion and how we performed against our goals.
Find more information on Checkr Engineering here.
Benchmarks answer the question: “Where are we now, and have things been getting better, getting worse, or staying the same from a quality and velocity perspective?”
Rolling 12 Month Trends
Timeframe: 2021.3.22 – 2022.3.27
Notes: Over the previous 12 months, deployment frequency dropped. For non-monolith code we went from an average of 56 deployments per week to an average of 50 deployments per week. For Checkr monolith deployments we went from an average of 13 per week down to an average of 7 per week.
Timeframe: 2021.3.22 – 2022.3.27
Notes: Over the past 12 months, cycle time increased. For all non-monolith source code, we went from an average of 4 days to 6 days. For Checkr monolith, we went from an average of 5 days to 8 days.
Change Failure Rate
Notes: Over the past 12 months, change failure rate increased. For all non-monolith source code, we went from an average of 20% to 28%. For Checkr monolith, we went from an average of 34% to 61% change failure rate.
Mean Time To Recovery
Timeframe: 2021.3.1 – 2022.2.1
Notes: Mean time to recovery trendline went from just over 30 hours down to just over 10 hours, or dropped by 66%; a huge accomplishment. Our ideal end state will be MTTR measured on the order of seconds (rather than minutes or hours), but this is still a positive trajectory we will continue to drive forward.