A Deep Learning Quarantine — Introduction and Mistake 1

Ryan Harvey
9 min readFeb 9, 2021

So I’ve been working in the UK for the last 12 or so months in a soul crushing job answering phones for a finance company. I did the classic “go to the UK to find oneself” gap year experience — perhaps slightly belated at 25.As we’re all familiar it’s all gone to custard with the global pandemic, and in an effort to escape another 12 months of sitting at home doing not a lot I managed to get a much prized spot on a repatriation flight to Aus.

Now, this is where things get slightly interesting. Getting back into Australia requires a 2 week mandatory supervised quarantine. Essentially we all sit in corrugated steel huts in Darwin, and eagerly await laundry day so we may walk a glorious 20 meters from our rooms. to be frank, it’s rather dull.

With that in mind, I decided to use my quarantine time effectively, learn some new skills an produce something of some merit.

So what’s the goal?

I thought it would be fun to have a crack at replicating a piece of work by the great and powerful Harrison Kinsley of Sentdex wherein he used nothing more than labelled image data to train a convolutional neural network to drive an GTA — see below:

The inspiration and much of the source code for my project

I’ve chosen an arbitrary video from the series — but the whole thing is pretty important for what I’m trying to do here .

The gist of the project is this: use a convolutional neural network to label frames of the game according to the desired action within that frame.

Conceptually, the approach is pretty simple;

  1. play the game for a number of hours with a key-logger to record frames and the action taken by the player at each frame.
  2. train a convolutional neural network on those frames
  3. make the thing play the game

Simples!

As with everything in programming, we should expect resistance. and that’s precisely what I encountered, as such, I will proceed with this blog series in the style of mistakes and discussions. Mistakes were plenty and varied in this project and I am more than happy to share my learnings from these mistakes.

Mistake 1. Setting up the project

Upon review of my first night’s video diary, I did have a certain lack of understanding when it came to the transition between supervised learning and reinforcement learning. My initial plan was to take a CNN — train it on image and action data and hope that the actions chosen could somehow be retrofitted onto a Q matrix given appropriate scaling factors.

It was only after a week of developing that supervised CNN approach that I took the time to work out how Q learning works — a topic we shall address in a future blog. Critically, a Q learning network predicts the maximum future reward given a current state. In this way, the prediction of a Q learning algorithm isn’t the next action but the maximum possible reward of a given action. Now, assuming I played a perfect game and the rewards are scaled correctly, we can assume the two are one and the same.

The problem is, it would have been nice if I contemplated the reward function, and the transition matrix when I first started collecting data (again see my discussion of Q learning for the DL on that). At present, I have a series of about 200,000 images labelled with the action taken. Unfortunately — about 170,000 of those images are too low-resolution (400x300) to derive the reward from.

To elaborate on the resolution problem — I would set up the game such that the agent has the goal of following the purple waypoint on the map. Said objective comes with a handy dandy “distance to target” value on the map — which seems a pretty reasonable reward function if we simply take distance at time zero minus distance at t + 1. Unfortunately, the numbers are simply illegible at that resolution. Which effectively rules out that whole dataset for the purpose of QLearning. Also, this low resolution dataset is balanced so many of the frames do not have their neighbours (pretty important when you need to make a transition matrix of (current state, action, next state, reward). I can’t create that matrix from existing data if I don’t have the next state data.

The take home from these is that if I understood Q learning or had a vague inkling of how I would implement it later on, I could have collected data in a way which would have given me the transition vector at the time of colleciton (which would have been ideal), or at least allowed me to collect the information from the data I collected (less ideal because I’d have to iterate through dozens of gigabytes of data but still acceptable). Unfortunately, I have now created a very nice dataset which is pretty much useless for Q learning.

So that’s great.

Mistake 2: Collecting data, batching and randomization

Over the last several days, I have spent an inordinate amount of time iterating through collected data. I’ve balanced it, I’ve shuffled it, I’ve batched it, and all of these steps have been as time-consuming as they have been unnecessary.

As in all my life choices — I took the approach of act first — work out the details later when collecting my data. While that approach works well or a spontaneous move to London, it doesn’t work so well for a computer science project. Specifically, it works quite poorly for computer vision datasets as there are some serious considerations we need to take into account.

Balancing

Neglecting the DQN side of my project for the time being, neural networks trained for computer vision tasks really enjoy balanced datasets. For those uninitiated, that means they enjoy datasets which have the same number of instances in each category.

That’s great if your goal is to delineate cats and dogs and have 15,000 of cats and dogs each. No so good when you’re trying to drive, and our data set is 70% “go straight”

10% each “accelerate and turn right or left”

and the remaining 10% split between no key, brake, just turn left or just turn right

where I come from, that’s what we call “no bueno”.

You can guess what happens when we present a network with a dataset like this — it gets 70% accuracy for saying “go straight ahead” all the time, and it gets really happy about that and never learns anything else ever again.

No bueno.

Solution 1: Iterate and balance

Keeping in mind the fact that you really don’t want to be balancing your data during the training loop because you load the data n times where n is the number of epochs, you probably want to get your data happy before you start training.

So, we have this dataset of images (maybe 30GB — 30,000 frames), we’re going to get 10–15% yield out of that if we decide to balance it properly (about 30% if we choose to balance it less properly) and the whole process will take around 2–3 hours. At the end we’ll have 3,000–6,000 frames later today I’ll do this exact thing and we can see how the thing performs on a super balanced vs less balanced dataset.

Point is — if you collect unbalanced data — you’re going to take up a lot of data for the amount of data you actually end up using, there will be several slow steps between collection and training, and you’ll waste a lot of time

Solution 2: Collect balanced data

This is the approach I used for my image recognition network approach, and for all it worked reasonably well — I would up collecting one out of every 8 “go straight forward” frames and wound up with a dataset with

the following proportions

F — 1500

Forward +Right — 1000

Forward + Left -1000

Left — 500

Right — 500

No key — 250

Brake — 250

Brake + left — 100

Brake + Right — 100

So, I think there are a couple of solutions here —

I trained on this data with a full screen and a radar approach and to be honest they were both pretty awful (maximum of 70% accuracy on the training set and much worse on the testing set) so I think another solution is in order.

There are two things which I will try this week on this one

  1. train two neural networks — one which says accelerate or brake and one which says right or left.
  2. train one network on super balanced data

I think option 1 actually has a good basis because it allows me to effectively double the usefulness of the data I’ve got, and make two networks to work together. It also means I will guaranteed have balanced data for the side to side action which is the important part — braking will be a little more challenging, but honestly, if I use a slow enough scooter or run the thing on the freeway — I shouldn't need braking.

The other approach has its merits in that I like the non-parallel nature of that but I would have to use a very small portion of my dataset.

So that’s the next thing I’ll try.

Shuffling and other issues

This section will be a bit more practical than the previous.

When we train a network using Pytorch (or any machine learning architecture, we need a batch of data. Now in my naiveté starting out with Pytorch, I thought we were forced to use the DataLoader class to produce our batches for learning. We do not. All we need is a bunch of data with X and associated y values. How you produce those is entirely up to you.

DataLoader requires we give it a dataset which can be accessed by using index I. a stipulation perfectly reasonable if we are using a dataset which can be loaded into memory, and pretty much completely impossible if it cannot. DataLoader has a __getitem__ method which returns a sample given an index. Consider what happens when we ask dataloader to produce a batch of 50 over a dataset divided into dozens of 1 GB files. If we cleverly index the whole dataset such that a call of i = n will go into the correct data file and produce the correct entry therefrom — the __getitem__ method will be forced to load a whole file into memory every single time we want to get an item — wayyyyyyyy too slow.

So we come up with another approach — simply iterate over all the datafiles and then iterate over the items in each datafile — simple enough. The trick is in ensuring the datafile is large enough that when we shuffle the datafile when it is called — there is enough size to ensure each batch is quite different, and also we must ensure each datafile is balanced happily.

This is — as far as I can tell — the best way to load in data for Pytorch from very large datasets.

I might write a class to do this at some point, but meh.

See my current code for joy and learning;

In this little bit of code I produce the mega_batch_list which is just a list of all the file names in a given directory (note if you have a file in the folder which is not training data meant for iteration the algorithm will break), we then iterate through the length of that list — and load in file mega_batch_array- which is just a numpy array of n samples where n is the number of samples in the saved file.

I hashed out np.random.shuffle(mega_batch_array) in this snippet, but this is a very important line for ensuring the code actually trains. In this case, data files are 5000 samples long. By calling shuffle before we start producing the batches, we produce something which I hope is sufficiently close to a randomized dataset for the purposes of training. For reference — 5000 samples is about 6km of driving in game or about one waypoint trip — so there is certainly an argument to be made that there is insufficient randomness within mega batch files. However, I have made an effort to reduce variance by keeping the time of day identical thought the training set and so while each megabatch may have similar road conditions (city, freeway or dirt) and thus lack some randomness there, it will have a range of different conditions within that condition which I hope will allow proper training. Certainly, this is a nice compromise which allows for good training in an efficient way.

As a note, I did consider another solution to this randomness problem — admittedly for when I wasn't keeping the time of day consistent. In this solution I loaded in as many datafiles as my RAM could take — concatenated that massive file — shuffled it and resaved into batches or megabatches. It did take about 2 hrs to go through the dataset and do this, but whatever works ya know?

Current megabatches are around 2GB so this would octuple the dataset from which shuffling could occur, but this was slow to execute and I didn’t see much benefit. It’s probably worth working with mega_batches of size RAM/2 to maximize the benefit of loading in all at once, and the benefit of shuffling without impeding the need for RAM in the training process.

--

--

Ryan Harvey

Food futures and technology consultant and writer. I use medium to write the things I can’t say on other people’s publications.