ML and technical debt
Machine Learning: The High Interest Credit Card of Technical Debt
-
link to paper: https://research.google/pubs/pub43146/
-
Complex Models Erode Boundaries
- Entanglement
- Generally not possible to make isolated changes - Changing Anything Changes Everything
- Applies to features, signals, parameter settings, etc
- Somewhat innate to ML
- Hidden feedback loop
- ML systemโs predictions end up influencing its own training data
- May happen in surprising ways, such as two systems are dependencies of each other
- Can result in gradual changes not immediately visible, hard to detect and debug
- Undeclared consumers
- Changes to model impact undeclared downstream app
- Also potential to create hidden feedback loops
- Entanglement
-
Dependency debt
- Data dependency cost more than code dependencies
- Data dependency is harder to track than code dependencies
- Unstable dependency
- Legacy features
- Bundle features
- episilon Features โ small improvement in accuracy with huge complexity overhead
- Correction cascade โ tendency to use another model and learn a calibration layer
- Data dependency cost more than code dependencies
-
System level spaghetti
- Glue code - most of the code is not the model itself.
- Pipeline jungles
- Dead Experimental Codepaths
- Configuration Debt