Common Pitfalls of Machine Learning

Expecting Machine Learning model training to be faster than writing software
- Stills needs lots of supporting software and infrastructure
  - Need to ensure robust, scalable etc.
- Also pipelines etc. for data collection, prep, training
- Push people to start with software solution first
No data collected yet
- Also need to regularly using this data, e.g. to generate reports—otherwise likely to be stale
Keep humans in the loop
- For core/critical systems especially
- Curate training data, handle edge cases, review data
Product launch focused on ML algorithm
ML optimized for the wrong thing
- e.g. search optimized for engagement (clicks)
  - Might learn to serve bad results—cause users to click back and try other links
Is your ML improving things in the real world?
- Need to show impact to stakeholders
Using a custom algorithm vs pre-trained
- Expectation—ease of use of pre-trained models means building own is easy—false
Not retraining algorithms
- Invest in making process seamless
Don’t design your own perception or NLP algorithm
- Seem much easier than they really are
- Optimized from decades of research
- Always use off the shelf models

🧠 mattdunn.info