ML and technical debt

Machine Learning: The High Interest Credit Card of Technical Debt

  • Complex Models Erode Boundaries

    • Entanglement
      • Generally not possible to make isolated changes - Changing Anything Changes Everything
      • Applies to features, signals, parameter settings, etc
      • Somewhat innate to ML
    • Hidden feedback loop
      • ML systemโ€™s predictions end up influencing its own training data
      • May happen in surprising ways, such as two systems are dependencies of each other
      • Can result in gradual changes not immediately visible, hard to detect and debug
    • Undeclared consumers
      • Changes to model impact undeclared downstream app
      • Also potential to create hidden feedback loops
  • Dependency debt

    • Data dependency cost more than code dependencies
      • Data dependency is harder to track than code dependencies
      • Unstable dependency
      • Legacy features
      • Bundle features
      • episilon Features โ€“ small improvement in accuracy with huge complexity overhead
      • Correction cascade โ€“ tendency to use another model and learn a calibration layer
  • System level spaghetti

    • Glue code - most of the code is not the model itself.
    • Pipeline jungles
    • Dead Experimental Codepaths
    • Configuration Debt