Frederick O'Loughlin

Creative Projects

Experiments, mad ideas, and other attempts to be innovative.

Improving On Machine Learning Examples

Finding ways to get more out of an online machine learning tutorial by getting better results.

There are no shortage of applications for machine learning, and there is a great community dedicated to learning how to do it better. I have enjoyed working through a number of tutorials applying different methods to solve different problems. I am also convinced that to get the most out of these you need to do more than follow along blindly. Finding a way to improve the end results is an great way to go beyond what is being taught.

Machine learning models typically are scored for accuracy on a holdout dataset. These will be examples it has not seen during training. Testing this way ensures the model has not just become overly familiar with its training data. The following three examples achieve higher validated accuracy then the tutorials.

When looking to predict political parties through their voting records using a Convolutional Neural Network, I was able to improve on the original validated accuracy of 93%. The key improvement was to better handle null values, the tutorials pre-processing approach cut out a lot of data that was still useful. The dataset was also unbalanced with more of one category than another, so I added a weight bias to handle this better. My final validated accuracy was 98.6%.

I employed a similar approach with categorising complaints using a LSTM. The tutorial dropped a high quantity of empty labels, but I was able to rebuild these from other columns resulting there being nearly 5x the data to work from. The tutorial reached 82%, mine managed to reach 95%. For one more example, a simple spam filter model that used a random forest algorithm achieved 93% accuracy, but by upgrading it to a dense neural network and improving some of the data pre-processing, a final score was reached of 99%.

Return to homepage