Primers • MLOps Tooling
- Overview
- Tooling
- CI/CD for Machine Learning
- Cron Job Monitoring
- Data Catalog
- Data Enrichment
- Exploratory Data Analysis
- Data Management
- Vector Databases
- Data Processing
- Data Validation
- Data Visualization
- Drift Detection
- Feature Engineering
- Feature Store
- Hyperparameter Tuning
- Model Lifecycle
- Model Serving
- Model Testing & Validation
- Simplification Tools
- Further Reading
- Citation
Overview
- The figure below summarizes a typical MLOps flow:
Tooling
CI/CD for Machine Learning
- Tools for performing CI/CD for Machine Learning.
- Github Actions: Automate, customize, and execute your software development workflows right in your repository with GitHub Actions.
- ClearML: Auto-Magical CI/CD to streamline your ML workflow.
Cron Job Monitoring
- Tools for monitoring cron jobs (recurring jobs).
- Cronitor: Monitor any cron job or scheduled task.
Data Catalog
- Tools for data cataloging.
- Apache Atlas: Provides open metadata management and governance capabilities to build a data catalog.
Data Enrichment
- Tools and libraries for data enrichment.
- Snorkel: A system for quickly generating training data with weak supervision.
Exploratory Data Analysis
- Tools for performing data exploration.
- Google Colab: Hosted Jupyter notebook service that requires no setup to use.
- Jupyter Notebook: Web-based notebook environment for interactive computing.]
Data Management
- Tools for performing data management.
Vector Databases
- Tools for VectorDB storage.
Data Processing
- Tools related to data processing and data pipelines.
Data Validation
- Tools related to data validation.
- Cerberus: Lightweight, extensible data validation library for Python.
- Cleanlab: Python library for data-centric AI and machine learning with messy, real-world data and labels.
- Great Expectations: A Python data validation framework that allows to test your data against datasets.
Data Visualization
- Tools for data visualization, reports and dashboards.
- Tableau: Powerful and fastest growing data visualization tool used in the business intelligence industry.
Drift Detection
- Tools and libraries related to drift detection.
- TorchDrift: A data and concept drift library for PyTorch.
Feature Engineering
- Tools and libraries related to feature engineering.
- Featuretools: Python library for automated feature engineering.
Feature Store
- Feature store tools for data serving.
- Feast: End-to-end open source feature store for machine learning.
Hyperparameter Tuning
- Tools and libraries to perform hyperparameter tuning.
Model Lifecycle
- Tools for managing model lifecycle (tracking experiments, parameters and metrics).
- Aim: A super-easy way to record, search and compare 1000s of ML training runs.
- Mlflow: Open source platform for the machine learning lifecycle.
- Neptune AI: The most lightweight experiment management tool that fits any workflow.
- Weights and Biases: A tool for visualizing and tracking your machine learning experiments.
Model Serving
- Tools for serving models in production.
Model Testing & Validation
- Tools for testing and validating models.
- Deepchecks: Open-source package for validating ML models & data, with various checks and suites.
Simplification Tools
- Tools related to machine learning simplification and standardization.
Further Reading
Awesome MLOps: References and Articles
Awesome MLOps: Tools
- A list of tools for machine learning operations (MLOps).
Citation
If you found our work useful, please cite it as:
@article{Chadha2020DistilledMLOpsTooling,
title = {MLOps Tooling},
author = {Chadha, Aman},
journal = {Distilled AI},
year = {2020},
note = {\url{https://aman.ai}}
}