Model monitoring is the tracking of the performance of a deployed model to ensure reliable service and reliable predictions. It requires comprehensive infrastructure or processes to proactively monitor model availability, model performance, model bias, business impact, the relevance of prediction data, etc.
The performance of a deployed model usually deteriorates with time because the characteristics of the data used for making prediction generally drift from those of the data used to build the model due to factors such as inflation, weather, market volatility and political changes. Model monitoring enables users and engineers to spot issues before they affect the business.
Some of the approaches to model monitoring, which are generally used in combination, are outlined below.
Monitoring Service Health #
This process involves tracking metrics about the health of deployed model: prediction latency, throughput and error rate.
Monitoring Changes in Performance Metrics #
This process involves regularly comparing the performance metrics, e.g., RMSE, AUC, etc of a deployed model against the same metrics before the model was deployed. A significant change in performance strongly indicates that the model needs re-training; however, pinpointing the exact point at which change in performance requires action is not an exact science but is completely domain specific.
Monitoring Model Output Distribution Changes #
This approach involves calculating the difference between the distribution of the outputs of the model on the test data and the distribution of the predictions made by the deployed model on new data collected over time. Stability index is typically used for this approach. A significant change in the distribution difference metric may indicate that the model has gone stale.
Monitoring Descriptive Feature Distribution Changes #
This method is similar to “Monitoring Model Output Distribution Changes” but compares the distribution of the descriptive features used by the model. It is useful in identifying the changes in features that caused a model to go stale. Key metrics include the stability index, the χ2 statistic, and the K-S statistic.
Since changes in the distributions of a handful of features will likely not have a significant impact on the performance of a high dimensional model, this approach will be useful in models that use a very small number of descriptive features.
John D. Kelleher, Brian Mac Namee, and Aoife Darcy. Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press, 2015. Page 508-512