Style Transferring in TensorFlow

Neural Style Transfer (NST) refers as a class of software algorithm manipulate digital images, or videos, or adopt the appearance or visual style of another image. When we implement the algorithm, we define two distances; one for the content (Dc) and another for the form (Ds).

In the topic, we will implement an artificial system based on Deep Neural Network, which will create images of high perceptual quality. The system will use neural representation to separate, recombine content-image (a style image) as input, and returns the content image as it is printed using the artistic style of the style image.

Neural style transfer is an optimization technique mainly used to take two images- a content image and a style reference image and blend them. So, the output image looks like the content image to match the content statistics of the content image and style statistics of the style reference image. These statistics are derived from the images using a convolutional network.

Working of the neural style transfer algorithm

When we implement the given algorithm, we define two distances; one for the style (Ds) and the other for the content (Dc). Dc measures the different the content is between two images, and Ds measures the different the style is between two images. We get the third image as an input and transform it into both minimize its content-distance with content-image and its style-distance with the style-image.

Libraries Required

import tensorflow as tf  
#we transform and models because we will modify our images and we will use pre-trained model VGG-19   
from torchvision import transforms, models  from PIL 
import Image  
import matplotlib.pyplot as plt  
import numpy as np  

VGG-19 model

VGG-19 model is similar to the VGG-16 model. Simonyan and Zisserman introduced the VGG model. VGG-19 is trained on more than a million images from ImageNet database. This model has 19 layers of the deep neural network, which can classify the images into 1000 object categories.

High-level architecture

Neural style transfer uses a pertained convolution neural network. Then define a loss function which blends two images absolutely to create visually appealing art, NST defines the following inputs:

A content image (c)- Image we want to transfer a style to
A styling image (s)- The image we want to move the method from
An input image (g) - The image which contains the final result.

The architecture of the model is same, as well as the loss, which is computed, is shown below. We do not need to develop a profound understanding of what is going on in the image below, as we will see each component in detail in the next several sections to come. The idea is to give a high-level of understanding of the workflow taking place style transfer.

Downloading and loading the pertained VGG-16

We will be borrowing the VGG-16 weights from this webpage. We will need to download the vgg16_weights.npz file and replace that in a folder called vgg in our project home directory. We will only be needing the convolution and the pooling layers. Explicitly, we will be loading the first seven convolutional layers to be used as the NST network. We can do this using the load_weights(...) function given in the notebook.

Note: We have to try more layers. But beware of the memory limitations of our CPU and GPU.

# This function takes in a file path to the file containing weights
# and an integer that denotes how many layers to be loaded.
vgg_layers=load_weights(os.path.join('vgg','vgg16_weights.npz'),7)

Define the functions to build the style transfer network

We define several functions that will help us later to fully define the computational graph of the CNN given an input.

Creating TensorFlow variables

We loaded the numpy arrays into TensorFlow variables. We are creating following variables:

content image (tf.placeholder)
style image (tf.placeholder)
generated image (tf.Variable and trainable=True)
Pretrained weight and biases (tf.Variable and trainable=False)

Make sure we leave the generated image trainable while keeping pretrained weights and weights and biases frozen. We show two functions to define input and neural network weight.

def define_inputs (input_shape):
"""
This function defines the inputs (placeholders) and image to be generated (variable)
"""
content = tf.placeholder(name='content' , shape=input_shape, dtype=tf.float32)
style= tf.placeholder(name='style', shape=input_shape, dtype=tf.float32)
generated= tf.get_variable(name='generated', initializer=tf.random_normal_initalizer=tf.random_normal_initiallizer(), shape=input_shape, dtype=tf.float32, trainable=true)
return {'content':content,'style,'generated': generated}
def define_tf_weights():
"""
This function defines the tensorflow variables for VGG weights and biases
"""
for k, w_dict in vgg_layers.items():
w, b=w_dict['weights'], w_dict['bias']
with tf.variable_scope(k):
  tf.get_variable(name='weights', initializer=tf.constant(w, dtype=tf.float32), trainable=false)
tf.get_variable(name='bias', initializer=tf.constant(b, dtype=tf.float32), trainable=False)

Computing the VGG net output

Computing the VGG net output
Def build_vggnet(inp, layer_ids, pool_inds, on_cpu=False):
"This function computes the output of full VGG net """
    outputs = OrderedDict()
    
    out = inp


for lid in layer_ids:
        with tf.variable_scope(lid, reuse=tf.AUTO_REUSE):
            print('Computing outputs for the layer {}'.format(lid))
            w, b = tf.get_variable('weights'), tf.get_variable('bias')
            out = tf.nn.conv2d(filter=w, input=out, strides=[1,1,1,1], padding='SAME')
out = tf.nn.relu(tf.nn.bias_add(value=out, bias=b))
            outputs[lid] = out


        if lid in pool_inds:
            with tf.name_scope(lid.replace('conv','pool')):
                out = tf.nn.avg_pool(input=out, ksize=[1,2,2,1], strides=[1, 2, 2, 1], padding='SAME')
                outputs[lid.replace('conv','pool')] = out


return outputs

Loss functions

In the section, we define two loss functions; the style loss function and the content function. The content loss function ensures that the activation of the higher layer is similar between the generated image and the content image.

Content cost function

The content cost function is sure that the content present in the content image is captured into the generated image. It has been found that CNN captures information about the content in the higher levels, where the lower levels are more focused on single-pixel values.

Let A^l_{ij}(I) is the activation of the lth layer, ith feature map, and j th position achieve using the image I. Then the content loss is defined as

The intuition behind the content loss

If we visualize what is learned by a neural network, there's evidence that suggests that different features maps in higher layers are activated in the presence of various objects. So if two images have the same content, they have similar activations in the top tiers.

We define the content cost as follows.

def define_content_loss(inputs, layer_ids, pool_inds, c_weight):
c_outputs= build_vggnet (inputs ["content"], layer_ids, pool_inds)
g_outputs= build_vggnet (inputs ["generated"], layer_ids, pool_inds)
content_loss= c_weight * tf.reduce_mean(0.5*(list(c_outputs.values())[-1]-list(g_outputs.values())[-1])**2)

Style Loss function

It define the style loss function which desires more work. To derive the style information from the VGG network, we will use full layers of CNN. Style information is measured the amount of correlation present between feature maps in a layer. Mathematically, the style loss is defined as,

Intuition behind the style loss

By the above equation system the idea is simple. The main goal is to compute a style matrix for the originated image and the style image.

Then, the style loss is defined as a root mean square difference between the two styles matrices.

	   def define_style_matrix(layer_out):
 """
	This function computes the style matrix, which essentially computes
	how correlated the activations of a given filter to all the other filers.
	Therefore, if there are C channels, the matrix will be of size C x C
	"""
	n_channels = layer_out.get_shape().as_list()[-1]
	unwrapped_out = tf.reshape(layer_out, [-1, n_channels])
	 style_matrix = tf.matmul(unwrapped_out, unwrapped_out, transpose_a=True)
	return style_matrix
	
	def define_style_loss(inputs, layer_ids, pool_inds, s_weight, layer_weights=None):
	 """ 
	This function computes the style loss using the style matrix computed for
	 the style image and the generated image 
	 """ 
	c_outputs = build_vggnet(inputs["style"], layer_ids, pool_inds)
	g_outputs = build_vggnet(inputs["generated"], layer_ids, pool_inds)
	
	 c_grams = [define_style_matrix(v) for v in list(c_outputs.values())]
	g_grams = [define_style_matrix(v) for v in list(g_outputs.values())]
	    
	    if layer_weights is None:
	        style_loss =  s_weight * \
	            tf.reduce_sum([(1.0/len(layer_ids)) * tf.reduce_mean((c - g)**2) for c,g in zip(c_grams, g_grams)])
	    else:
	        style_loss = s_weight * \

Next TopicGram Matrix in Style Transferring

← prev next →