Primers • Concept Drift
Introduction
-
Let’s talk about concept drifting (also known as “shifting”) which is an important topic in MLOps and ensuring success of your model in production.
-
A predictive ML model learns theta to output \(P(Y \mid X; \theta)\).
- Data drift/shift refers to changes in \(P(X)\): different data distributions, different feature space. Data drift usually causes training to fail to produce a good model because the training data does not match real-world examples.
- Ex: service launched in a new country, expected features missing.
- Label schema drift/shift refers to changes in \(Y\): new classes, outdated classes, finer-grained classes. Especially common with high-cardinality tasks.
- Ex: there’s a new disease to categorize.
- Model drift/shift refers to changes in \(P(Y \mid X)\): same inputs expecting different outputs.
- Ex: when users searched for Wuhan pre-Covid, they expected different results from what they do now.
- Model drift can be cyclic e.g. ride-share demands weekday vs. weekend.
- Data drift/shift refers to changes in \(P(X)\): different data distributions, different feature space. Data drift usually causes training to fail to produce a good model because the training data does not match real-world examples.
- Concept drift doesn’t happen only because of drastic events. It happens all the time, either:
- suddenly (new competitors, competitors changing their pricing policies, new memes, new events, new fads) or
- gradually (changes in social norms, cultures, languages)
- Different features drift at different rates.
- Ex: an app ranking might be useful for predicting whether it gets downloaded, but it drifts quickly.
-
In production, you sometimes might want to use less good features if they’re more stable.
- Not all drifts require model retraining or relabeling, but all require monitoring.
Monitoring for drifts
- Model drifts are not uncommon. Few things that can be done for monitoring:
- Having data quality and integrity checks on feature collection and inference services to detect unexpected inputs/outcomes.
- Have a hold out set to be reviewed manually by stakeholder teams on a monthly/quarterly basis to detect changes in business models.
- You can also do it automatically as well using a classifier. Treat your entire training data (\(X\)) and apply pseudo label 1, get a sample from production data (\(X\)), using similar sampling as applied to training data, and apply pseudo label 0. Fit a classifier and measure how well its able to separate the two classes (MCC is a good metric). A drifted space is when it’s very easy for classifier to separate the two. This measures data drift and scales well.
- Supposedly Netflix monitors by applying the Kolmogorov–Smirnov test, according to a presenter at the Spark + AI Summit.