

Transforming ML tuning for clearer, faster risk detection
Transforming ML tuning for clearer, faster risk detection
64%
Boost in risk alert accuracy
31%
Improvement in user satisfaction
Reduced reliance on company analysts

64%
Boost in ML risk alert accuracy
31%
Improvement in user satisfaction
Reduced reliance on company analysts

This case study reflects my work. Certain details were adjusted to honour confidentiality.
Problem
Analyst dependency slowed ML risk detection
Problem
Analyst dependency slowed ML risk detection
Problem
Complex ML tuning created risk-detection bottlenecks
Behavox’s ML tuning relied on analysts, slowing detection and hurting model quality. Compliance officers couldn’t adjust rules themselves. Queues grew, alerts lagged, and exposure to real threats increased.
Behavox’s ML model tuning required manual code changes that most Risk Compliance officers could't make
This created a dependency on Behavox analysts. It slowed feedback loops and delayed ML risk prediction.


Low clarity: Users called the flow “slow and opaque”. User trust and tool usage dropped.
Low clarity: Users called the flow “slow and opaque”. User trust and tool usage dropped.


Unmet needs: Compliance risk managers lacked oversight tools. This hurt model health monitoring.
Unmet needs: Compliance risk managers lacked oversight tools. This hurt model health monitoring.


Stalled loops: Review and tuning sat in queues. Real threats could slip through.
Stalled loops: Review and tuning sat in queues. Real threats could slip through. |
Stalled loops: Review and tuning sat in queues. Real threats could slip through. |
Ideation
Ideation
Four directions. One goal: get feedback closer to the model
Four directions. One goal: get feedback closer to the model
My goal was to connect feedback, review, and oversight into a single flow.
My goal was to connect feedback, review, and oversight into a single flow.



Feedback-first design. Let compliance officers train ML models directly, in context.
Feedback-first design. Let compliance officers train ML models directly, in context.
Feedback-first design. Let compliance officers train ML models directly, in context.




Streamlined review paths. Reduced steps and errors by exploring panels, inline actions, and modals.
Streamlined review paths. Reduced steps and errors by exploring panels, inline actions, and modals.
Streamlined review paths. Reduced steps and errors by exploring panels, inline actions, and modals.


In-context scenario assignment. Let reviewers tag new signals in the moment, sharpening ML accuracy and cutting rework.
In-context scenario assignment. Let reviewers tag new signals in the moment, sharpening ML accuracy and cutting rework.













Manager oversight. A shared view of ML health to reduce analyst load and maintain quality at scale.
Manager oversight. A shared view of ML health to reduce analyst load and maintain quality at scale.




Testing
Testing
Testing
I made the wrong call: pattern over behavior
I made the wrong call: pattern over behavior
I reused a hover tooltip for ML feedback. I assumed users would find it since it matched an existing pattern. Users made decisions in the justification area, not in highlighted text.


"I wasn't able to find how to give feedback on flagged risk signals."




"I always go to the justification… it helps me clarify flagged risk content."


Feedback lived away from the decision point. I had designed around their mental model instead of into it.
"I wasn't able to find how to give feedback on flagged risk signals."

"I always go to the justification… it helps me clarify flagged risk content."

Feedback lived away from the decision point. I had designed around their mental model instead of into it.
Iteration
I moved ML feedback to where decisions happen
I moved ML feedback to where decisions happen
I moved ML feedback to where decisions happen
Testing revealed the gap. Users went to the justification first, every time. So I moved feedback entry there.
Testing revealed the gap. Users went to the justification first, every time. So I moved feedback entry there.


Before:
Before:
1
1
Flagged signal lacks emphasis, buried among secondary fields.
2
2
Regulatory data shows raw URL, interrupts the decision flow.
3
All fields carry equal weight, nothing signals what matters most.
4
Competing secondary statistical metadata.


3
All fields carry equal weight, nothing signals what matters most.
4
Competing secondary statistical metadata.


After:
After:
1
1
Flagged signal leads. Elevated and visually distinct from supporting data.
2
2
Primary context surfaced inline. One scan, two key facts.


3
3
Secondary metadata hidden by default. Surface stays focused on the decision.
4
4
Raw URL replaced with a clean external link. Regulatory detail accessible, not disruptive.
Compliance officers got to the risk alert justification faster. With less friction.
Compliance officers got to the risk alert justification faster. With less friction.
Testing
Two clearer flows shipped as a result
Testing
Two clearer flows shipped as a result
Two clearer flows shipped as a result

Side panel for scenario assignment. Chosen for fit and fast build. Users stayed in context while tagging new signals.
Side panel for scenario assignment. Chosen for fit and fast build. Users stayed in context while tagging new signals.

Manager analytics dashboard. Built around two goals: track ML health and monitor review contributions.
Manager analytics dashboard. Built around two goals: track ML health and monitor review contributions.
We prioritized tuning and deferred oversight to keep momentum
Pivot
Note: We were rolling out a new design system, so I updated my designs and added to the library.
We prioritized tuning and deferred oversight to keep momentum
Pivot
Note: We were rolling out a new design system, so I updated my designs and added to the library.


Pivot
We prioritized tuning and deferred oversight to keep momentum
Aggregation pipeline limitations blocked a full dashboard build.
Aggregation pipeline limitations blocked a full dashboard build.
With the PM, I secured a two-week window to validate the officer flow and launch it. We moved the dashboard to Phase 2 to keep speed and cut rework.
With the PM, I secured a two-week window to validate the officer flow and launch it. We moved the dashboard to Phase 2 to keep speed and cut rework.


Handoff
I delivered faster ML tuning
and set the foundation for scale
I delivered faster ML tuning
and set the foundation for scale
Note: We were rolling out a new design system, so I updated my designs and contributed to the library.
Phase 1: After validating the Compliance Officer flow, I handed it to devs. We shipped ML feedback entry in justification, enabling faster tuning with less friction.
Phase 1: After validating the Compliance Officer flow, I handed it to devs. We shipped ML feedback entry in justification, enabling faster tuning with less friction.




Phase 2: We kept the Compliance Manager dashboard for future implementation to restore oversight, guide quality, and balance load.





Phase 2: The dashboard for compliance managers was prioritized in the backlog for a future release.
Handoff
I delivered faster ML tuning and set the foundation for scale
After testing confirmed our direction, I applied the new design system before handoff.
We were implementing a new design system at the time. I updated the project's designs to ensure consistency and scalability.
Handoff
I delivered faster ML tuning and set the foundation for scale
After testing confirmed our direction, I applied the new design system before handoff.
We were implementing a new design system at the time. I updated the project's designs to ensure consistency and scalability.
Learnings
Learnings
Designing for clarity, speed, and momentum
Designing for clarity, speed, and momentum
Place actions where decisions happen. It lifts discoverability and starts immediately.
Place actions where decisions happen. It lifts discoverability and starts immediately.
Validate now, scale later. Phased builds keep momentum when new constraints block. |
Validate now, scale later. Phased builds keep momentum when new constraints block. |
Standardize the fastest review path. It speeds completion and cuts errors.
Standardize the fastest review path. It speeds completion and cuts errors.
Want the full story?
I help teams remove friction and ship faster.
Want the full story?
I help teams remove friction and ship faster.
Want the full story?
I help teams remove friction and ship faster.
Wanna hear the full story?
I help teams remove friction and ship faster.

This case study reflects my work. Certain details were adjusted to honour confidentiality.
Transforming ML tuning for clearer, faster risk detection
Behavox’s ML model feedback loop was slow, manual, and code-heavy. Users relied on company analysts to tune models. This led to missed risks and reduced trust in the system.
Pivot
We prioritized faster tuning and deferred oversight to keep momentum


Aggregation pipeline limitations blocked a full dashboard build.
With the PM, I secured a two-week window to validate the officer flow and launch it. We moved the dashboard to Phase 2 to keep speed and cut rework.
Learnings
Designing for clarity, speed, and momentum
Place actions where decisions happen. It lifts discoverability and starts immediately.
Validate now, scale later. Phased builds keep momentum when new constraints block.
Standardize the fastest review path. It speeds completion and cuts errors.
Problem
Analyst dependency slowed ML risk detection
Behavox’s ML model tuning required manual code changes that most Risk Compliance officers could't make. This created a dependency on Behavox analysts. It slowed feedback loops and delayed ML risk prediction.
Ideation
Four directions. One goal: get feedback closer to the model
My goal was to connect feedback, review, and oversight into a single flow. I aimed to enable faster tuning and clearer decisions.


"I wasn't able to find how to give feedback on flagged risk signals."



"I always go to the justification… it helps me clarify flagged risk content."

Test data confirmed users couldn’t find the feedback entry. The thumbs-up/down pattern blended with content, so most users never entered the loop.
Testing
I made the wrong call: pattern over behavior
I reused a hover tooltip for ML feedback. I assumed users would find it since it matched an existing pattern. Users made decisions in the justification area, not in highlighted text.


Low clarity: Users called the flow “slow and opaque”. User trust and tool usage dropped.


Unmet needs: Compliance risk managers lacked oversight tools. This hurt model health monitoring.


Stalled loops: Review and tuning sat in queues. Real threats could slip through.






