Introduction

  • Let’s talk about concept drifting which is an important topic in MLOps and ensuring success of your model in production.

  • A predictive ML model learns theta to output \(P(Y \mid X; \theta)\).

    1. Data drift (covariate shift, virtual drift, or virtual concept drift) refers to changes in \(P(X)\) (different features, same labels): different data distributions, different feature space. Data drift usually causes training to fail to produce a good model because the training data does not match real-world examples – which is a common failure mode after model deployment. In other words, data drift is the situation where the model’s input distribution changes.
      • For e.g., service launched in a new country, expected features missing.
      • Concept drift doesn’t happen only because of drastic events. It happens all the time, either:
        • suddenly (new competitors, competitors changing their pricing policies, new events, new fads, new memes, etc.) or
        • gradually (changes in social norms, cultures, languages)
      • Different features drift at different rates.
        • For e.g., an app ranking might be useful for predicting whether it gets downloaded, but it drifts quickly.
      • In production, you sometimes might want to use less good features if they’re more stable.
    2. Model drift (or concept drift) refers to changes in \(P(Y \mid X)\) (same features, different labels): same inputs expecting different outputs.
      • Model/concept drift is the situation when the underlying functional relationship between the model inputs and outputs changes. The context has changed, but the model doesn’t know about the change. Its learned patterns do not hold anymore.
      • For e.g., if we’re trying to predict life expectancy using geographic regions as input. As the region’s development level increases (or decreases), our model loses its predictive power and thus degrades. As another e.g., when users searched for Wuhan pre-Covid, they expected different results from what they do now.
      • Model drift can be cyclic, for e.g., ride-share demands weekday vs. weekend.
    3. Label drift (or schema shift, label prior probability shift) refers to changes in \(Y\) (different labels): new classes, outdated classes, finer-grained classes. Especially common with high-cardinality tasks.
      • For e.g., there’s a new disease to categorize.
  • Not all drifts require model retraining or relabeling, but all require monitoring.

Monitoring for drifts

  • Model drifts are not uncommon. Few things that can be done for monitoring:
    1. Having data quality and integrity checks on feature collection and inference services to detect unexpected inputs/outcomes.
    2. Have a hold out set to be reviewed manually by stakeholder teams on a monthly/quarterly basis to detect changes in business models.
  • You can also do it automatically as well using a classifier. Treat your entire training data (\(X\)) and apply pseudo label 1, get a sample from production data (\(X\)), using similar sampling as applied to training data, and apply pseudo label 0. Fit a classifier and measure how well its able to separate the two classes (MCC is a good metric). A drifted space is when it’s very easy for classifier to separate the two. This measures data drift and scales well.
  • Supposedly Netflix monitors by applying the Kolmogorov–Smirnov test, according to a presenter at the Spark + AI Summit.

References