
Why I should 1 : Divide by spectral norm [ARTICLE IN PROGRESS DONT TAKE IT SERIOUSLY]
The idea of this series of very short articles (Why I should) is to explain some commonplace in machine learning.
Simply explain common tricks with mathematical arguments.
Let’s dive into the spectral norm !
The interest of the spectral norm
If you work on GAN or robustness, you probably deal with spectral norm.
In GANs, the spectral norm is used on the discriminator and/or on the generator as in papers : Spectral Normalization for Generative Adversarial Networks and Semantic Image Synthesis with Spatially-Adaptive Normalization.
In robustness, it is also used throughout the network to make it robust.
In both cases, the network weight matrices are divided by their spectral norm.The goal of this operation is to make 1-lipschitz continuous the network.
Lipschitz continuous application
If
In practice
As
This equivalence is only true for linear applications. It is natural to be interested only in the linear application. In fact it is only the weight matrices of the models that interest us here.
How to interpret this property ? : we can see the norm of a vector as its energy
. We can therefore rewrite Lipschitz’s condition as : In a sense the energy ratio between input and output is bounded, this property ensures that the energy does not explode. This makes our application more stable, more robust.
It is this property that is sought in GANs or to make its network robust.
Having lipschitz continuous applications ensures that our model is robust. The lipschitz constant
We would like to have a stronger condition than the
A solution to make our layer 1-lipschitz is to use the spectral norm.
Spectral norm
As I said, the mathematical object that will make a network 1-Lipschitz continuous is the spectral norm.
Let
To transform a linear application into a 1-lipschitz continuous application, simply divide the matrix W of the application by the spectral norm of this matrix :
Proof that is Lipschitz continuous
By definition it’s mean is 1-Lipschitz continuous. ∎
Apply spectral norm to a neural network
import torch.nn as nn
m = nn.utils.spectral_norm(nn.Linear(20, 40))
Before going any further I will recall a few facts.
Let
Proof that is -lipschitz
By definition it’s mean that