Generating Artstyles with Generative Adversarial Networks

Student: Jimenez guajardo Diego enrique

Educational Programme: Data Science (Master)

Year of Graduation: 2021

In the Computer Vision field, many models have been proposed for the style-transfer task. The irruption of deep learning models made this researchfield more attractive as better results were appearing. One of the most emblematic models was the one from Gatys et al., on 2015, which was a CNN of 19 layers, also know as VGG-19, gave amazing results after training. Some caveats found in this approach, was that the model had to be trained every time when changing pictures. This was later solved by other approaches, however, when other methods based on more innovative approaches like Generative Adversarial Networks, the style-transfer task generalized into a broader field called image-to-image translation. Here, the style-transfer was understood as an image that is translated from one domain into another, it could be from day to night, or from winter to summer. In this context, the main objective of this thesis is to find a model that can generate different art-styles using Generative Adversarial Networks. The assumption behind this objective, lies in the fact that the generator can be constructed as a Variational Auto-Encoder, that can encode different art-styles in the latent space that can be later retrieve from it. There are two main architectures that fits this goal, the DRIT model proposed by the University of California and MUNIT model proposed by Nvidia Lab. They both share similar ideas, however, given a one-to-one comparison on a Cat2Dog dataset, all metrics showed that MUNIT architecture is more reliable. So, the training was carried out with a dataset containing images from four different artists, which all had different art-styles during their careers. After 400 000 steps, the model seemed to encode all the art-styles, however, the latent style-vector did not seemed completely disentangled. The MUNIT authors stated that loss functions like the Cycle-Consistency Loss used in Cycle-GAN, can push the model to a uni-modality, which is true, however, the Cycle-Consistency Loss, improves the quality of image-to-image translation, therefore, there was a trade-off between style-transfer quality and the modality of the model. We found that a weight of 0.5-1.0 in the Cycle-Consistency Loss can help the style transfer while maintaining a decent disentanglement on the latent space. The results shows that the styles were not directly encoded in the latent space, but the attributes that compose each art-style were successfully encoded. Many of the styles of iconic paintings can be retrieved from the latent style-vector, it needs, however, to get familiarized with it and have some patience.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses