Generating the dataset from unlabeled image data
Since I had zero experience with generative adversarial networks, I thought I should document some problems I had to overcome.
Quoting Wikipedia: "A generative adversarial network (GAN) is a class of machine learning systems. Two neural networks contest with each other in a zero-sum game framework. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics. It is a form of unsupervised learning."
I'm not doing any introduction about how a GAN works since there are a lot of materials online with far better insights than the ones I could give. I actually think the original paper by Ian J. Goodfellow et al. is very good at explaining.
Ian Goodfellow recently appeared on Lex Fridman MIT Artificial Intelligence podcast. In case you want to know more about the history of GANs you should watch it: https://youtu.be/Z6rxFNMGdn0
To make thiseyedoesnotexist I had to generate a dataset suitable for generative adversarial training. This meant I had to find a way to sort thousands of unlabeled images using an automatic unsupervised method. I didn't know that in the beginning, but I discovered in the process. This story is a recipe for that procedure.
This was my first plan:
- gather images about makeup related stuff
- train a DCGAN on the dataset
This adventure started with gathering of 200k publicly available images related to makeup. There are multiple methods to do this and I will leave this to your imagination.
Here is a sample from the images gathered,
I happily followed the tutorial on the Pytorch website, regarding the DCGAN implementation and used an NVIDIA 1080ti for training the network.
When training finished, I was shocked with how bad the results were. I actually run the thing quite some time trying to understand what went wrong.
The fake images actually got worse with more epochs of training,
The generator and discriminator loss confirmed this,
At this moment I realized I was lacking some intuition about GANs and their problems so I started reading a lot about them. Eventually I understood that the distribution I was trying to model was too rich and that GANs suffer from something called mode collapse.
Since I had already trained a DCGAN I had the following thought:What if I use the discriminator as a feature extractor to try to separate the images into classes? The discriminator must have built some sort similarity measure on the last hidden layer. If I use a subset of images, all very similar to each other, I might achieve better results.
I devised a second plan:
Since I already had trained the DCGAN and I wanted to make the distribution easier to learn I made the following plan:
- Use the features generated by the last hidden layer of the DCGAN discriminator
- Reduce every image feature representation to 50 dimensions using PCA
- Make a t-sne of top of that (reduced it further to two dimension)
- Use one of the newer GAN architectures such as ProGAN
After converting all the images to their "last hidden layer discriminator representation", reduced them to 50 components using PCA and applied t-sne to a 2 dimensional representation, the plot I got was the following:
At this point I felt really excited. I quickly used Kmeans to understand what was in each of the clusters. Each of the images grids corresponds to images sampled from the annotated red cluster(s):
These two clusters were almost always eye makeup pictures!
This one was related to lips!
This cluster was mostly related to random makeup products
While I was hearing Ian Goodfellow talk, I discovered that this procedure is somewhat documented in academia: https://youtu.be/Z6rxFNMGdn0?t=2564
Now that I had a set of images really similar to each other I decided to train a GAN on them.
- Choose a cluster of images (eye makeup)
- Train a ProGAN on it!
In case you want to contact me,[email protected]