published: None
Not too long ago I became roommates with a Red Eared Slider turtle. He keeps to himself, is quite, and relatively clean so overall not bad as far as roommates go. But I have heard that if turtles don't dry their shell off once a day they could get all moldy like a hard-shelled loaf of bread, and I didn't want that to happen to my new little buddy. It made me realize, I had no idea how long it spent out the water. I would see him sitting on his log or fake rock, but I was never around enough to see if he was out long enough to dry off or if he did it every day. Which made me think, is it possible to watch him constantly? Could I make an automated way of tracking his basking time? And how could I waste as much time as possible doing this?
I don't know much about neural networks but this seemed like the type of problem they would be good at solving. In Silicon Valley if they did "hot dog" "not hot dog" then surely I could make " wet turtle" "dry turtle" right? With the help of a webcam and a Raspberry Pi I was able to train a classifier that would generate a prediction of when the turtle was basking. Around every 20 seconds, the webcam would snap a picture of the turtle and the Raspberry Pi running my trained model would determine what it thought the odds of the turtle being out of the water where. It would then update the site below so that I could track throughout the day at which times my model though that the turtle was out of the water. By adding up all of the predictions from a day where the chance of the turtle is out of the water was greater than 50% and multiplying that by the median time differences between predictions, I created a rough estimate of how long the turtle spent drying off.
Click here to see my real time predictions
My main goal in doing this was to learn more about machine learning and neural networks. Before starting this I had no knowledge about the topic, and now I can confidently say I know extremely little about machine learning. I did learn a lot, but they weren't exactly the lessons that I was expecting. There were three big takeaways I had.
Going into this project I was prepared to dust off my linear algebra textbook and prepare to learn about the nitty-gritty neural networks. I started reading up on them and watching Youtube videos. But I quickly discovered that the barrier to entry in creating a neural network to recognize images was much lower than I anticipated. The nitty-gritty math stuff could be mostly be abstracted away by using TensorFlow and working with TensorFlow can be abstracted further by using Keras. This allowed me to do all of my training and interactions with my model using python. The most difficult part was installing all these different resources and getting them to play well together. But for that, I had the help of a plethora of articles, blogs, and how-to on machine learning. The amount of helpful material that was available was a pleasant surprise. The biggest eye-opener about how low the barrier to entry came when I simply copied a model from this site and trained it with my turtle images, and it accurately classified 97% of turtle pictures.
When the model I trained was able to accurately identify 97% of my input pictures I thought that there was not much more I needed to do to improve it. 97% is an A+ right? Well, it turns out I wasn't as done as I thought. When I learned another lesson, accuracy by itself is not a good metric. Since the turtle spends most of its time swimming around a large majority of the pictures I had were of it in the water. My model discovered that it could get an A+ by classifying every single picture as a being in the water regardless of what the turtle was doing. This taught me that accuracy on its isn't a perfect metric to use its own. To get a better idea of how my model was doing I started using Confusion Matrixes. These still showed the accuracy of each prediction but it breaks it down into four categories; falsely identified in water, correctly identified in water, falsely identified sun-basking, and correctly identified sun-basking. With this, I could spot at a glance that something like the 97% accuracy score was bad when every picture was classified in the water since I would be able to see that there were zero correctly identified sun-basking.
To compensate for it always classifying the turtle as being in the water, I attempted using weights on the correctly identifying pictures of the turtle sun-basking. This would increase the reward the model got for identifying sun-basking relative to when it was in the water, but I could never get the weights set to point where they would get the model to a desirable level. In the end, I had the most success by limiting the number of training pictures that I had of the turtle in the water so it matched the number of pictures I had of it sun-basking. Even though the number of total pictures I was using to train the model was much less, the results still ending out better.
Output of the first layer of my model that is first layer to interact with the raw image. This is what the model "sees" when looking at a picture of the turtle.
Another dial that I twisted in my quest to make my model as accurate as possible was the raw input that I put into it. I tried doing different preprocessing on my training pictures like shrinking them down and converting them to greyscale, as well as thresholding them to convert them to simple black and white. While this removed details from pictures it left the key features while making the size of the image smaller. This allowed for less noise that could distract the model and fewer data for my model to be processed, saving computational resources. My limited hardware saving resources meant that this would allow me to add more layers to the model. The number of neurons and layers of neurons that I added also raised the dilemma of over and under training the model. When I add more layers to the model it could more easily come to recognize features in my training pictures that it used to identify as being in the water. But when it would run against pictures it wasn't trained with it would not do as well. After enough rounds of training, the model went from having a general sense of the relationship between the input pictures and whether it meant the turtle in the water to memorizing features about the individual pictures in the training set. This led to the training pictures being classified with higher accuracy, but the validation pictures were not in the training set and the accuracy for them dropped. I would combat this by slightly reducing the number of layers and also used the technique of early stopping. This would save the state of the model after every round of training and then keep the one that did the best recognizing the validation pictures. This would give the best results and ignore the downwards spiral of over-training.
In the end, I had the most luck by reducing most of the image processing I was doing and instead used pre-trained models. The pre-trained model has been tested on millions of pictures of an assortment of different objects, something that I would never be able to do. Although this would not know anything about my turtle, it does have a very good general sense of what an image is. It can look at the raw pixels of a picture and distinguish what are features and what is not relevant like static. Then instead of connecting my model to the raw pixels of my turtle pictures, I could connect it to the output of the pre-trained model that would only contain features worth analyzing. This sort of changed my perception of a neural network. I never thought about how malleable they could be and repurposed for different tasks or how multiple could be chained together to reach a specific goal.
From looking at the theory neural networks it seems obvious that data is important since they rely so heavily on it for training. In the real world, the economic importance that it is placed on data backs it up that data is important. I was still surprised to see that a vast majority of my time on this project was not spent tinkering with the neural network, but with gathering and processing the data. In this case data for me was pictures of the turtle. When I would think about trying some new features or doing an experiment, I would find myself asking how much time I'd have to spend to get or process the data to do it. When I originally started I would copy the pictures taken from the Raspberry Pi over to my laptop, then opening each picture, determining if the turtle was in the water or not, then save the picture to a folder based on what I determined. Given the number of pictures that I wanted this quickly became out of hand. So what did was one by one start to automate the tasks that did not require my direct input. I reduced the times I would have to log into the Raspberry Pi by creating a server on it that I could configure, receive images from, upload files to, and receive data from all by sending requests to it on my laptop. Then I found that if I uploaded my pictures onto Google Drive I could access them from my website. I made a webpage specifically designed to classify these pictures. It reduced my input from having to open each picture and save it to a new location, to simply press the name of the Drive folder I wanted to move it to. After each picture was moved it would automatically load the next one. I then automated it further by using the figuring out how to use the Google Drive API so that I could automatically forward each picture I got from the Raspberry Pi up to Drive without having to upload them myself. Eventually, I had it so that the only thing I needed to do manually was to download all of the images from Drive after they were classified. Having an infrastructure in place to automate my interactions with the data meant that when I had a working model created I could use the modified server to copy the model over to the Rasberry Pi. Using the same interface I could request the prediction result of whether the turtle is in the water from the Raspberry Pi. To stay consistent with the theme of minimizing human input as much as possible, I made my own API to upload the predictions to this site. Using the lessons learned from working with the Google Drive API I added a simple REST API would allow me to send predictions to the site so that it would update as they came in.
Diagram of how the data was processed
Automation didn't just free up the time it took to do certain tasks, it freed up the mental energy that could be applied to other issues, it freed me from having to worry about increasing the amount of data I would need to process, and it freed me from having to constantly double-check myself incase I committed some human error. Data is the fuel that this machine learning project ran on. But it is easy to get overwhelmed in it. It was important to invest in the infrastructure to handle it early on so that I better able to adapt to changes to the goals of the project and scale up if I needed to.
The predictions from the model were not 100% accurate, they would still be thrown off by things like a foreign object being placed on the tank or the webcam being slightly moved. I did learn enough about the turtle's sun-basking habits to see that he probably won't be going moldy anytime soon and I did learn a good bit about neural networks. Machine learning is a buzz word that's often thrown around, and doing a project based around it has helped to demystify the concept of them for me.