Secret Sauce
Jul. 13th, 2017 08:47 amOne of the cool things at WWDC this year was machine learning. Is it a hot dog? Is it a rose? I saw a number of demonstrations and sat through several presentations about how iOS will be able to import a variety of models. Very cool and exciting. But you need to have a trained model to import and creating one is in an entirely different scope. And honestly, every time I've dipped into the programming literature, the math quickly went beyond my comfort level in terms of being able to understand what the heck was going on.
But then I found a pointer to the TensorFlow project, and decided to give it another try. Reading through their tutorials, I started to have a suspicion that I might actually understand what was going on. And then I tracked down this article, https://medium.com/safegraph/a-non-technical-introduction-to-machine-learning-b49fce202ae8, and suddenly it all started making sense. Machine learning is applied statistics, and in its simplest form, it's not even very complex statistics. Which also explained why the trained models they were adding to iOS projects were so very small; the models are just equations. Now, they can be fairly complex equations, but they're just equations.
When they call it machine learning, it's easy enough to assume that the model continues learning as you use it, giving it more data. No, by the time you begin using the model, all the learning is done, at least with the systems I've looked at. The learning part happens when you let the code tweak the equation to better fit what you want to see as a result, and that's where the dangerous part is. There are a number of decisions the model binder has to make and they all influence the outcome of the model, and that's not even touching the quality of the data set or whether the assumed correlation is real enough to be useful. And even if you don't make a wrong step in making and training and tuning the model, it's still dependent upon past data. If the nature of the correlation changes, it will invalidate the model.
All of which is a fancy way of saying that if someone comes to you with a fancy machine learning model, don't treat it like a magical black box, because it isn't. Also, brush up on your statistics if r-squared is still greek to you. Really, it's not that bad, and it can come in handy when reviewing scientific studies or listening to economists.
But then I found a pointer to the TensorFlow project, and decided to give it another try. Reading through their tutorials, I started to have a suspicion that I might actually understand what was going on. And then I tracked down this article, https://medium.com/safegraph/a-non-technical-introduction-to-machine-learning-b49fce202ae8, and suddenly it all started making sense. Machine learning is applied statistics, and in its simplest form, it's not even very complex statistics. Which also explained why the trained models they were adding to iOS projects were so very small; the models are just equations. Now, they can be fairly complex equations, but they're just equations.
When they call it machine learning, it's easy enough to assume that the model continues learning as you use it, giving it more data. No, by the time you begin using the model, all the learning is done, at least with the systems I've looked at. The learning part happens when you let the code tweak the equation to better fit what you want to see as a result, and that's where the dangerous part is. There are a number of decisions the model binder has to make and they all influence the outcome of the model, and that's not even touching the quality of the data set or whether the assumed correlation is real enough to be useful. And even if you don't make a wrong step in making and training and tuning the model, it's still dependent upon past data. If the nature of the correlation changes, it will invalidate the model.
All of which is a fancy way of saying that if someone comes to you with a fancy machine learning model, don't treat it like a magical black box, because it isn't. Also, brush up on your statistics if r-squared is still greek to you. Really, it's not that bad, and it can come in handy when reviewing scientific studies or listening to economists.