Streamlining ML Feedback Loops: 64% Risk Accuracy Increase for Behavox

Streamlining an ML feedback loop

Enhancing an ML feedback loop

TL;DR: Behavox’s ML model fine-tuning was too complex. It frustrated users and hurt risk detection accuracy. I streamlined workflows with embedded feedback tools, improving both usability and precision.

Super.com’s loyalty rewards system was complex causing low user engagement and retention. Users struggled to understand and redeem rewards. This hurt Super's ability to build loyalty. I led a redesign to create a task-based system that improved clarity and engagement.

Impact

Role

Outputs

Rise in alert accuracy

11%

Increase in satisfaction

Improved compliance

Impact

Role

Outputs

Rise in alert accuracy

11%

Increase in satisfaction

Improved compliance

NBC

NBC

SUPER

NBC

TL;DR: Behavox’s ML model fine-tuning was too complex. It frustrated users and hurt risk detection accuracy. I streamlined workflows with embedded feedback tools, improving both usability and precision.

Impact

My role

Deliverables

64%

Rise in risk alert accuracy

13%

Increase in user satisfaction

Improved compliance

Impact

My role

Deliverables

64%

Rise in risk alert accuracy

13%

Increase in user satisfaction

Improved compliance

The company

A compliance and risk management platform

Behavox offers SaaS tools to track, detect, and manage compliance risks. These tools help enterprises ensure regulatory adherence and reduce threat exposure.

The company

A compliance and risk management platform

Known for travel deals, e-commerce discounts, and fintech services, the app faced a major issue: low usage of its rewards program.

The company

A compliance and risk management platform

Behavox offers SaaS tools to track, detect, and manage compliance risks. These tools help enterprises ensure regulatory adherence and reduce threat exposure.

The company

A compliance and risk management platform

Known for travel deals, e-commerce discounts, and fintech services, the app faced a major issue: low usage of its rewards program.

The problem

A complex ML model tuning flow

Users struggled with making code changes to adjust risk detection models.

This inefficiency frustrated users and hurt adoption. It led to missed risks and lower compliance accuracy for organizations.

The problem

A complex ML model fine-tuning process

Users struggled with making code changes to adjust risk detection models.

This inefficiency frustrated users and hurt adoption. It led to missed risks and lower compliance accuracy for organizations.

Users struggled with making code changes to adjust risk detection models.

This inefficiency frustrated users and hurt adoption. It led to missed risks and lower compliance accuracy for organizations.

Users struggled with making code changes to adjust risk detection models.

This inefficiency frustrated users and hurt adoption. It led to missed risks and lower compliance accuracy for organizations.

The challenge

How might we…

Make it easier for users to fine-tune ML models and improve usability and risk detection accuracy?

Discovery

Using diagrams to clarify the ML refining flow

Discovery

Using diagrams to clarify the ML refining flow

Discovery

Using diagrams I clarified the ML refining flow

I mapped the legacy ML model lifecycle to understand where the refining process fit.

I mapped the legacy ML model lifecycle to understand where the refining process fit.

These diagrams helped me understand the flow and validate with engineers on Slack for quick feedback.

User interviews uncovered setbacks and opportunities

I worked with our researcher to scope interviews and uncover key insights:

"I spend half my time reviewing irrelevant alert content"

Alert overload overwhelmed users, driving the need for quicker marking and review.

"I spend half my time reviewing irrelevant alert content"

Alert overload overwhelmed users, driving the need for quicker marking and review.

Manual model updates were time-consuming and inefficient

Updating the system manually was time-consuming and inefficient

Manual model updates were time-consuming and inefficient

Updating the system manually was time-consuming and inefficient

Adding new alert signals was critical for better accuracy

The inability to add new signals (phrases) reduced the model’s effectiveness.

Adding new alert signals was critical for better accuracy

The inability to add new signals (phrases) reduced the model’s effectiveness.

We also identified a new user type

Workflows varied greatly based on company size:

Small companies: A few compliance officers managed operations.

Large organizations: A full team, led by a compliance risk manager, monitored performance and met regulations.

Definition

I quickly turned insights into solutions through sketches and wireframes

I turned insights into solutions with sketches and wireframes

1/4
These early drafts aligned the team and helped us prioritize effectively. We opted for a feedback-based approach to reduce coding needs. This allowed ML to leverage users expertise for risk validation.
2/4
I explored ways to improve ML feedback:
1. Review panel: Guided users through steps while tracking progress.
2. Inline modal dialog: Focused users on isolated, one-off decisions.
3. Justification dropdown: Kept feedback contextual and close to regulatory details.
3/4
Enhancing model scenario training: I introduced the "link to scenario" feature, enabling users to assign new content to ML scenarios. This ensured accurate sorting and refined models for specific contexts.
4/4
I addressed compliance managers’ ML performance monitoring needs. Dashboards centralized actionable insights and simplified workflows. They helped managers track performance and make informed decisions.

1/4
These early drafts aligned the team and helped us prioritize effectively. We opted for a feedback-based approach to reduce coding needs. This allowed ML to leverage compliance officers' expertise for risk validation.
2/4
I explored three ways to improve ML risk alert reviews:
1. Review panel: Guided users through steps while tracking progress.
1. Inline modal dialog: Focused users on isolated, one-off decisions.
2. Justification dropdown: Kept feedback contextual and close to regulatory details.
3/4
Enhancing model scenario training: I introduced the "link to scenario" feature, enabling users to assign new risk content directly to ML scenarios. This ensured accurate categorization and refined models for specific contexts.
4/4
I addressed compliance managers’ ML performance monitoring needs by designing dashboards. These centralized actionable insights and simplified workflows. They helped managers track performance and make faster, informed decisions.

1/4
These early drafts aligned the team and helped us prioritize effectively. We opted for a feedback-based approach to reduce coding needs. This allowed ML to leverage compliance officers' expertise for risk validation.
2/4
I explored three ways to improve ML risk alert reviews:
1. Review panel: Guided users through steps while tracking progress.
1. Inline modal dialog: Focused users on isolated, one-off decisions.
2. Justification dropdown: Kept feedback contextual and close to regulatory details.
3/4
Enhancing model scenario training: I introduced the "link to scenario" feature, enabling users to assign new risk content directly to ML scenarios. This ensured accurate categorization and refined models for specific contexts.
4/4
I addressed compliance managers’ ML performance monitoring needs by designing dashboards. These centralized actionable insights and simplified workflows. They helped managers track performance and make faster, informed decisions.

1/4
These early drafts aligned the team and helped us prioritize effectively. We opted for a feedback-based approach to reduce coding needs. This allowed ML to leverage compliance officers' expertise for risk validation.
2/4
I explored three ways to improve ML risk alert reviews:
1. Review panel: Guided users through steps while tracking progress.
1. Inline modal dialog: Focused users on isolated, one-off decisions.
2. Justification dropdown: Kept feedback contextual and close to regulatory details.
3/4
Enhancing model scenario training: I introduced the "link to scenario" feature, enabling users to assign new risk content directly to ML scenarios. This ensured accurate categorization and refined models for specific contexts.
4/4
I addressed compliance managers’ ML performance monitoring needs by designing dashboards. These centralized actionable insights and simplified workflows. They helped managers track performance and make faster, informed decisions.

Usability testing

Testing solutions to enhance ML training and workflows

I collaborated with researcher to validate our solutions. We validated 3 user-centered solutions:

Usability testing

Testing solutions to enhance ML training and workflows

I collaborated with researcher to validate our solutions. We validated 3 user-centered solutions:

NBC

A binary feedback system to simplify flagged content reviews.

A side panel to assign new risk signals directly to ML model scenarios.

A side panel to assign new risk signals directly to ML models.

NBC

NBC

A manager analytics dashboard to track ML health and feedback contributors.

A analytics dashboard to track ML health and feedback contributors.

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox.

Artsiom Mezin

Sr. Engineering Manager

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox.

Artsiom Mezin

Sr. Engineering Manager

Challenge

Overcoming backend hurdles to keep the project moving

As I finalized prototypes for testing, backend restrictions emerged as a major roadblock.

As I finalized prototypes for testing, backend restrictions emerged as a major roadblock.

Challenge

Overcoming backend hurdles to keep the project moving

As I finalized prototypes for testing, backend restrictions emerged as a major roadblock.

Challenge

Overcoming backend hurdles to keep the project moving

As I finalized prototypes for testing, backend restrictions emerged as a major roadblock.

We had to cut the dashboard from the project due to inaccurate time estimates.

We had to cut the dashboard from the project due to inaccurate time estimates.

Management weighed pausing the project due to the unclear estimates. This risked further delays and missed deadlines.

To prevent a project pause, I worked with my PM to find a solution.

To prevent a project pause, I worked with my PM to find a solution.

To avoid a project pause, I worked with my PM to find a workaround.

To keep momentum, I worked with my PM to secure a two-week testing window. This let us evaluate the rest of the solution while devs updated estimates.

To keep momentum, I worked with my PM to secure a 2-week testing window. This let us evaluate the rest of the solution while devs updated estimates.

Usability testing

My assumptions about feedback placement didn’t hold up in testing

"I wasn't able to find how to give feedback on flagged risk signals."

"I wasn't able to find how to give feedback on flagged risk signals."

I assumed users would click highlighted text to add feedback, as they were familiar with our text-highlight feature. Yet, most users struggled to find the option, exposing a gap in discoverability.

"I always go to the justification… it helps me clarify flagged risk content."

"I always go to the justification… it helps me clarify flagged risk content."

Testing revealed that users heavily relied on justifications to review flagged content. This confirmed they were crucial to workflows and the ideal area for the feedback feature.

Testing showed that users relied on justifications to evaluate flagged risk content. It was clear that justifications were key to their workflows and the best place to embed the feedback feature.

Iteration

Usability testing

Refining justifications to boost the feedback feature's usability

The justification area became the ideal place for feedback. But, it had usability challenges that needed fixing:

The cluttered layout, filled with dense text blocks, made scanning key details difficult for users. Also, key regulatory information was hidden among secondary important details.

To resolve these issues, I made a few improvements:

Enhanced spacing and grouping: cut clutter and improved scannability.

Added an expandable area for non-essential details: Minimized distractions.

Centralized regulatory data in a tooltip: Made critical details quicker to access.

To resolve these issues, I made a few improvements:

The justification area became the ideal place for feedback. But, it had usability issues that needed fixing:

The cluttered layout, filled with dense text blocks, made scanning key details difficult for users. Also, key regulatory information was hidden among secondary important details.

Enhanced spacing and grouping: Reduced clutter and improved scannability.

Added an expandable area for non-essential details: Minimized distractions.

Centralized regulatory data in a tooltip: Made critical details quicker to access.

Iteration

Refining justifications to boost the feedback feature's usability

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox UI.

Artsiom Mezin

Sr. Engineering Manager

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox UI.

Artsiom Mezin

Sr. Engineering Manager

User segmentation testing

Validating our new feedback approach

Users found giving feedback in the justification area intuitive. High clickthrough rates validated this approach.

Handoff

Shipping the new ML model feedback mechanism and planning ahead

After a final team review, I refined the designs and handed them off to the dev team. I also updated them to align with the newly deployed design system.

After a final team review, I refined the designs and handed them off to the dev team. I also updated them to align with the newly deployed design system.

In the first phase, we launched the ML model feedback mechanism.

Dashboard designs were added to the backlog for future implementation.

Handoff

Shipping the new ML feedback mechanism and planning ahead

After a final team review, I refined and handed the designs to devs.

We were deploying a new design system, so I updated the designs for consistency.

We were implementing a new design system at the time. I updated the project's designs to ensure consistency and scalability.

Handoff

Shipping the new ML feedback mechanism and planning ahead

After a final team review, I refined and handed the designs to devs.

We were deploying a new design system, so I updated the designs for consistency.

We were implementing a new design system at the time. I updated the project's designs to ensure consistency and scalability.

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox UI.

Artsiom Mezin

Sr. Engineering Manager

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox UI.

Artsiom Mezin

Sr. Engineering Manager

Impact

Simpler feedback and smarter models

Our new approach simplified workflows. It made risk detection faster and more intuitive. This paved the way for reducing regulatory violations and improving efficiency.

64%

Increase in risk alert accuracy

Boost in user activity rate

64%

Increase in risk alert accuracy

13%

Rise in user satisfaction

13%

Rise in user satisfaction

Improved compliance

Key learning

Smart trade-offs drive progress

Progress requires thoughtful trade-Offs

Backend challenges threatened project progress. Yet, teamwork and prioritization kept us moving forward. Testing part of the project uncovered valuable insights. This helped us make smarter trade-offs.

Key learning

Smart trade-offs drive progress

Communication with Yanick was always plain and easy. His general ingenuity had a significant impact on Behavox UI.

Artsiom Mezin

Sr. Engineering Manager

Wanna hear the full story?

Email me

Wanna hear the full story?

Email me

Up next

SUPER

Gamifying a loyalty and rewards system

NBC

Revamping a wealth management site

NOMADE

Improving usability for a transit app

Thanks for reading!

Let's stay in touch:

Simpler feedback and smarter models

Impact

Our new approach simplified workflows. It made risk detection faster and more intuitive. This paved the way for reducing regulatory violations and improving efficiency.

65%

Increase in risk alert accuracy

13%

Rise in user satisfaction

Improved compliance

Up next

SUPER
Gamifying a loyalty and rewards system
NBC
Revamping a wealth management site

Up next

SUPER
Gamifying a loyalty and rewards system
NBC
Revamping a wealth management site
Nomade
Improving usability for a transit app

Key learning

Smart trade-offs drive progress

Usability testing

My assumptions about feedback placement didn’t hold up in testing

We shifted to a task-based system called ‘Missions.’

The "Missions" concept was simple: users completed in-app tasks to earn rewards. It offered a more engaging and holistic way to communicate rewards.

Users struggled with making code changes to adjust risk detection models.

This inefficiency frustrated users and hurt adoption. It led to missed risks and lower compliance accuracy for organizations.

The problem

A complex ML model tuning flow

The challenge

How might we…

Make it easier for users to fine-tune ML models and improve usability and risk detection accuracy?

Behavox offers SaaS tools to track, detect, and manage compliance risks. These tools help enterprises ensure regulatory adherence and reduce threat exposure.

The company

A compliance and risk management platform

User interviews uncovered setbacks and opportunities

I worked with our researcher to scope 10 interviews revealing key insights:

Discovery

Using diagrams I clarified the ML refining flow

I mapped the legacy ML model lifecycle to understand where the refining process fit.

These diagrams helped me understand the flow and validate with engineers on Slack for quick feedback.

"I spend half my time reviewing irrelevant alert content."

Alert overload overwhelmed users, driving the need for quicker marking and review.

Manual model updates were time-consuming and inefficient

Updating the system manually was time-consuming and inefficient

Adding new alert signals was critical for better accuracy

The inability to add new signals (phrases) reduced the model’s effectiveness.

We also identified a new user type

Workflows varied greatly based on company size:

Small companies: A few compliance officers managed operations.

Large organizations: A full team, led by a compliance risk manager, monitored performance and met regulations.

We also identified a new user type

We also identified a new user type

Workflows differed a lot based on company needs. For instance:

Small companies: A few compliance officers handled operations.

Large organizations: A full team, led by a compliance risk manager, monitored performance and met regulations.

We also identified a new user type

Workflows varied greatly based on company size:

Small companies: A few compliance officers managed operations.

Large organizations: A full team, led by a compliance risk manager, monitored performance and met regulations.

User interviews uncovered setbacks and opportunities

I worked with our researcher to scope interviews and uncover key insights:

Usability testing

Testing solutions to enhance ML training and workflows.

I collaborated with researcher to validate our solutions. We validated three user-centered solutions:

A binary feedback system to simplify flagged content reviews.

We decided to test the thumbs-up/down approach. This aimed to simplify feedback while keeping users focus on flagged content.

A side panel to assign new risk signals directly to ML model scenarios.

We assessed how users assigned missed risk content to scenarios. This helped refine ML models by training them with accurate inputs.

A manager analytics dashboard to track ML health and feedback contributors.

I refined the dashboard based on available backend data. We wanted to assesses how managers used it to track model health and feedback contributors.

User segmentation testing

Validating our new feedback approac

Users found giving feedback in the justification area intuitive. High clickthrough rates validated this approach.

User segmentation testing

Validating our new feedback system

Users found giving feedback in the justification area intuitive. High clickthrough rates validated this.