## Introduction

• Let’s talk about concept drifting (also known as “shifting”) which is an important topic in MLOps and ensuring success of your model in production.

• A predictive ML model learns theta to output $$P(Y \mid X; \theta)$$.

1. Data drift/shift refers to changes in $$P(X)$$: different data distributions, different feature space. Data drift usually causes training to fail to produce a good model because the training data does not match real-world examples.
• Ex: service launched in a new country, expected features missing.
2. Label schema drift/shift refers to changes in $$Y$$: new classes, outdated classes, finer-grained classes. Especially common with high-cardinality tasks.
• Ex: there’s a new disease to categorize.
3. Model drift/shift refers to changes in $$P(Y \mid X)$$: same inputs expecting different outputs.
• Ex: when users searched for Wuhan pre-Covid, they expected different results from what they do now.
• Model drift can be cyclic e.g. ride-share demands weekday vs. weekend.
• Concept drift doesn’t happen only because of drastic events. It happens all the time, either:
• suddenly (new competitors, competitors changing their pricing policies, new memes, new events, new fads) or
• gradually (changes in social norms, cultures, languages)
• Different features drift at different rates.
• Ex: an app ranking might be useful for predicting whether it gets downloaded, but it drifts quickly.
• In production, you sometimes might want to use less good features if they’re more stable.

• Not all drifts require model retraining or relabeling, but all require monitoring.

## Monitoring for drifts

• Model drifts are not uncommon. Few things that can be done for monitoring:
1. Having data quality and integrity checks on feature collection and inference services to detect unexpected inputs/outcomes.
2. Have a hold out set to be reviewed manually by stakeholder teams on a monthly/quarterly basis to detect changes in business models.
• You can also do it automatically as well using a classifier. Treat your entire training data ($$X$$) and apply pseudo label $$1$$, get a sample from production data ($$X$$), using similar sampling as applied to training data, and apply pseudo label $$0$$. Fit a classifier and measure how well its able to separate the two classes (MCC is a good metric). A drifted space is when it’s very easy for classifier to separate the two. This measures data drift and scales well.
• Supposedly Netflix monitors by applying the Kolmogorov–Smirnov test, according to a presenter at the Spark + AI Summit.