Generating the dataset from unlabeled image data

Since I had zero experience with generative adversarial networks, I thought I should document some problems I had to overcome.

Quoting Wikipedia: "A generative adversarial network (GAN) is a class of machine learning systems. Two neural networks contest with each other in a zero-sum game framework. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics. It is a form of unsupervised learning."

I'm not doing any introduction about how a GAN works since there are a lot of materials online with far better insights than the ones I could give. I actually think the original paper by Ian J. Goodfellow et al. is very good at explaining.

Ian Goodfellow recently appeared on Lex Fridman MIT Artificial Intelligence podcast. In case you want to know more about the history of GANs you should watch it:

To make thiseyedoesnotexist I had to generate a dataset suitable for generative adversarial training. This meant I had to find a way to sort thousands of unlabeled images using an automatic unsupervised method. I didn't know that in the beginning, but I discovered in the process. This story is a recipe for that procedure.

Initial Plan

This was my first plan:

  • gather images about makeup related stuff
  • train a DCGAN on the dataset

This adventure started with gathering of 200k publicly available images related to makeup. There are multiple methods to do this and I will leave this to your imagination.

Here is a sample from the images gathered,

Image samples from training data

I happily followed the tutorial on the Pytorch website, regarding the DCGAN implementation and used an NVIDIA 1080ti for training the network.

When training finished, I was shocked with how bad the results were. I actually run the thing quite some time trying to understand what went wrong.


The fake images actually got worse with more epochs of training,

Mode Collapse

The generator and discriminator loss confirmed this,

At this moment I realized I was lacking some intuition about GANs and their problems so I started reading a lot about them. Eventually I understood that the distribution I was trying to model was too rich and that GANs suffer from something called mode collapse.

Since I had already trained a DCGAN I had the following thought:

What if I use the discriminator as a feature extractor to try to separate the images into classes? The discriminator must have built some sort similarity measure on the last hidden layer. If I use a subset of images, all very similar to each other, I might achieve better results.

I devised a second plan:

Second Plan

Since I already had trained the DCGAN and I wanted to make the distribution easier to learn I made the following plan:

  • Use the features generated by the last hidden layer of the DCGAN discriminator
  • Reduce every image feature representation to 50 dimensions using PCA
  • Make a t-sne of top of that (reduced it further to two dimension)
  • Use one of the newer GAN architectures such as ProGAN

After converting all the images to their "last hidden layer discriminator representation", reduced them to 50 components using PCA and applied t-sne to a 2 dimensional representation, the plot I got was the following:

At this point I felt really excited. I quickly used Kmeans to understand what was in each of the clusters. Each of the images grids corresponds to images sampled from the annotated red cluster(s):

Clusters related to eyes

These two clusters were almost always eye makeup pictures!

Clusters related to lips

This one was related to lips!

Random makeup products

This cluster was mostly related to random makeup products

While I was hearing Ian Goodfellow talk, I discovered that this procedure is somewhat documented in academia:

Now that I had a set of images really similar to each other I decided to train a GAN on them.

Final Plan

  • Choose a cluster of images (eye makeup)
  • Train a ProGAN on it!

In case you want to contact me,

[email protected]