Finova's fraud operations team was manually reviewing flagged transactions using a brittle internal tool built in 2019. As transaction volume grew 3× in 18 months, the backlog of unreviewed fraud alerts hit 48 hours — meaning real fraud was being caught too late, and legitimate transactions were being declined while customers waited. The tool surfaced alerts in chronological order with no risk scoring, no case history, and no bulk actions.
Analysts spent the first 2 minutes of every case hunting for context scattered across four tabs: transaction history, device fingerprint, account age, and prior flags. The tool showed alerts in FIFO order regardless of risk score, so high-risk fraud was often reviewed after low-risk noise. 78% of analysts said they had a mental "pattern" they checked every time — but the tool didn't support it. We also discovered that 60% of false positives came from a single category: first-purchase, high-value orders from new accounts in the EU — a segment that had tripled as we expanded.
We redesigned the review queue around the analyst's mental model rather than the database's record structure. The new queue surfaces cases ranked by a composite risk score (ML model + rule-based signals), consolidates the four context tabs into a single card with progressive disclosure, and adds one-click bulk actions for common patterns. We shipped in three phases: risk-ranked queue first (lowest eng effort, highest impact), then the consolidated case card, then bulk actions.
The biggest debate was whether to build a fully automated decision engine or improve the human review flow. I pushed for the human-in-the-loop approach for two reasons: (1) our ML model had only 78% precision at the time — not good enough to automate high-stakes declines, and (2) analysts' domain knowledge was genuinely catching edge cases the model missed. We agreed to improve tooling now and revisit automation in 6 months once precision improved. The second decision was the ranking algorithm: engineering wanted a pure ML score, but compliance required at least one hard rule (flag any transaction > $5k for human review). We built a hybrid: ML score with compliance overrides surfaced visually so analysts understood why a case was elevated.
The biggest challenge was the 6-week data migration from the legacy tool. Our fraud data was spread across three systems with inconsistent IDs. We hit a 3-week delay when we discovered that ~8% of historical cases had duplicate entries, which would have corrupted the ML training data. We fixed it by adding a deduplication step to the pipeline — but it pushed the launch by a sprint.
I underestimated how much analysts relied on muscle memory with the old tool. Even though the new interface was objectively better, we saw a 2-week productivity dip after launch as people adjusted. I'd budget a longer hands-on training period next time and consider a gradual rollout (opt-in for the first cohort) rather than a hard cutover. I'd also involve the Compliance team 2 sprints earlier — their sign-off on the ranking algorithm added a week at the end.