Fraud Detection

Offer Compliance Classification Model

Overview

This document outlines the development and evaluation of a classification model designed to automate and streamline the manual compliance review process for offers on the "Pass Culture" platform. The model aims to predict whether an offer will be validated or rejected based on various offer attributes, thereby reducing the manual workload.

Context

Manually created offers on the "Pass Culture" platform are subjected to a compliance scoring script. Offers that trigger alerts require manual review, leading to a significant volume of offers needing analysis.

Data

Dataset

The dataset comprises offers that have undergone manual review
The dataset contains approximately 230,000 offers, with a class imbalance of ~90% validated and ~10% rejected.

Features

The following offer attributes were used as features:

Textual:
- offer_name
- offer_description
Categorical:
- offer_subcategory_id
- venue_department_code
Numerical:
- stock
- stock_price
Boolean:
- outing
- physical_goods
Other:
- type
- subType
- rayon (product category)
- macro_rayon (broader product category)
- Offer image

Feature Engineering

Textual Features: Processed directly by CatBoost.
Categorical Features: One-hot encoded.
Numerical Features: Used as is.
Image Features: Embedded using a pre-trained model.

Model

Model Architecture

The model uses the CatBoostClassifier from the CatBoost library, which is well-suited for handling a mix of numerical, categorical, and textual features.

Training

The model was trained on the manually reviewed offer dataset.
Training temporality is to be reviewed for improvement.

Evaluation

The model was evaluated using a test set of 22,882 offers (20,187 validated, 2,695 rejected).

Metrics

Accuracy: 0.95
Balanced Accuracy: 0.81
Precision: 0.95
Recall: 0.99
Balanced Error Rate: 0.18

Confusion Matrix

Generated from the test set.

Feature Importance

SHAP Values: Used to estimate the contribution of each feature to the model's predictions.
Feature Analysis: Detailed analysis available in the associated notebook.
Probability Density: Visualization of the probability of offers being validated or rejected, enabling threshold optimization.