3

I've worked a long time ago with neural networks in Java and now I'm trying to learn to use TFLearn and Keras in Python.

I'm trying to build an autoencoder, but as I'm experiencing problems the code I show you hasn't got the bottleneck characteristic (this should make the problem even easier).

On the following code I create the network, the dataset (two random variables), and after train it plots the correlation between each predicted variable with its input.

What the network should learn, is to output the same input that receives.

import matplotlib.pyplot as plt
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.models import load_model
from loaders.nslKddCup99.nslKddCup99Loader import NslKddCup99

def buildMyNetwork(inputs, bottleNeck):
    inputLayer = Input(shape=(inputs,))
    autoencoder = Dense(inputs*2, activation='relu')(inputLayer)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(bottleNeck, activation='relu')(autoencoder)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(inputs, activation='sigmoid')(autoencoder)
    autoencoder = Model(input=inputLayer, output=autoencoder)
    autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
    return autoencoder


dataSize = 1000
variables = 2
data = np.zeros((dataSize,variables))
data[:, 0] = np.random.uniform(0, 0.8, size=dataSize)
data[:, 1] = np.random.uniform(0, 0.1, size=dataSize)

trainData, testData = data[:900], data[900:]

model = buildMyNetwork(variables,2)
model.fit(trainData, trainData, nb_epoch=2000)
predictions = model.predict(testData)

for x in range(variables):
    plt.scatter(testData[:, x], predictions[:, x])
    plt.show()
    plt.close()

Even though some times the result is acceptable, many others isn't, I know neural networks have weight random initialization and therefore it may converge to different solutions, but I think this is too much and there may be some mistake in my code.

Sometimes correlation is acceptable

Others is quite lost

**

UPDATE:

**

Thanks Marcin Możejko!

Indeed that was the problem, my original question was because I was trying to build an autoencoder, so to be coherent with the title here comes an example of autoencoder (just making a more complex dataset and changing the activation functions):

import matplotlib.pyplot as plt
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.models import load_model
from loaders.nslKddCup99.nslKddCup99Loader import NslKddCup99

def buildMyNetwork(inputs, bottleNeck):
    inputLayer = Input(shape=(inputs,))
    autoencoder = Dense(inputs*2, activation='tanh')(inputLayer)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(bottleNeck, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs, activation='tanh')(autoencoder)
    autoencoder = Model(input=inputLayer, output=autoencoder)
    autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
    return autoencoder


dataSize = 1000
variables = 6
data = np.zeros((dataSize,variables))
data[:, 0] = np.random.uniform(0, 0.5, size=dataSize)
data[:, 1] = np.random.uniform(0, 0.5, size=dataSize)
data[:, 2] = data[:, 0] + data[:, 1]
data[:, 3] = data[:, 0] * data[:, 1]
data[:, 4] = data[:, 0] / data[:, 1]
data[:, 5] = data[:, 0] ** data[:, 1]

trainData, testData = data[:900], data[900:]

model = buildMyNetwork(variables,2)
model.fit(trainData, trainData, nb_epoch=2000)
predictions = model.predict(testData)

for x in range(variables):
    plt.scatter(testData[:, x], predictions[:, x])
    plt.show()
    plt.close()

For this example I used TanH activation function, but I tried with others and worked aswell. The dataset has now 6 variables but the autoencoder has a bottleneck of 2 neurons; as long as variables 2 to 5 are formed combining variables 0 and 1, the autoencoder only needs to pass the information of those two and learn the functions to generate the other variables on the decoding phase. The example above shows how all functions are learnt but one, the division... I don't know yet why.

1

3 Answers 3

4

I think that your case is relatively easy to explain why your network might fail to learn an identity function. Let's go through your example:

  1. Your input comes from 2d space - and it doesn't lie on a 1d or 0d submanifold - due to uniform distribiution. From this it's easy to see that in order to get an identity function from your autoencoder every layer should be able to represent a function which range is at least two dimensional, beacuse the output of your last layer should also lie on a 2d manifold.
  2. Let's go through your network and check if it satisfy the condtion need:

    inputLayer = Input(shape=(2,))
    autoencoder = Dense(4, activation='relu')(inputLayer)
    autoencoder = Dense(4, activation='relu')(autoencoder)
    autoencoder = Dense(2, activation='relu')(autoencoder) # Possible problems here
    

    You may see that the bottleneck might cause a problem - for this layer it might be hard to satisfy the condition from the first point. For this layer - in order to get the 2-dimensional output range you need to have weights which will make all the examples not falling into saturation region of relu (in this case all this samples will be squashed to 0 in one of the units - what makes impossible for range to be "fully" 2d). So basically - the probability that this will not happen is relatively small. Also the probability that backpropagation will not move this unit to this region also cannot be neglected.

UPDATE:

In a comment the question was asked why optimizer fail to prevent or undo the saturation. It's an example of a one of the important relu downsides - once an example falls into a relu saturation region - this example doesn't directly take part in learning of a given unit. It could affect it by influencing previous units - but due to 0 derivative - this influence is not direct. So basically unsaturating an example comes from a side effect - not the direct action of an optimizer.

Sign up to request clarification or add additional context in comments.

2 Comments

Cool explaination! But I'm still not really certain to understand why the network couldn't learn that the weight should all be positives?! This is basic backpropagation, the updates should go in the right direction. From a math point of view, I see a global minimum that can be achieved, there are even several of them due to the higher dimensions of the first layers. We can even find the weight to achieve this minimum manually. How come that the optimizer can't find it? Maybe we should use another optimizer?
Or just linear activations? As I mentionned in the answer below, the nonlinearities don't really make sense here right?! We don't need to activate or desactivate neurons here... no need for complex patterns, only propagate the data without loosing information
0

This is a really cool usecase to see and study the difficulties of training a Neural Net.

Indeed I see multiple possibilities :

1) If there are very different results between 2 different runs, it can come from the initialization. But it can also come from your data set which isn't the same at every run.

2) Something that will make it difficult for the network to learn the correlation is your activations, more precisely the sigmoid. I would change that non linearity to another 'relu'. No reason to use a Sigmoid here, I know it looks linear around 0 but it isnt really linear. To produce a 0.7, the raw output will have to be quite high. The "easy" relation that you have in mind for this network isnt so easy since you constrain it.

3) if it is sometimes not good, maybe it needs more epochs to converge?

4) maybe you'd need a bigger dataset? In theory you're good since there is 900 samples for < 200 parameters in your network but who knows...

I can't try all of this since I'm on my phone right now but feel free to try and troubleshout with the hints I gave you :) I hope this helps.

EDIT :

After some trials, as Marcin Możejko was saying, the issue comes from the activations. You can read his answer to have more info on what is going wrong. The way to fix it is to change your activations. If you are a fan of 'relu', you can use a special version of this activation. For this, use a LeakyRelu layer and do not set an activation to the previous layer, like this :

    autoencoder = Dense(inputs*2)(inputLayer)
    autoencoder = LeakyReLU(alpha=0.3)(autoencoder)

This will solve the case where you get stuck in a nonoptimal solution. Otherwise, as I said above, you can try not to use any non-linearities. Your loss will go down way faster and doesn't get stuck.

Keep in mind that the non linearities were introduced to get the networks to find more complex patterns in the data. In your case you have the simplest linear pattern.

Comments

0

To visualize the reconstruction one could use MNIST digits and train a linear autoencoder (this time let's use sigmoid activation):

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Reshape
encoded_dim = 2
encoder = Sequential([
                      Flatten(input_shape=(28,28)),
                      Dense(256, activation='sigmoid'),
                      Dense(64, activation='sigmoid'),
                      Dense(encoded_dim)
])
decoder = Sequential([
                      Dense(64, activation='sigmoid', input_shape=(encoded_dim,)),
                      Dense(256, activation='sigmoid'),
                      Dense(28*28, activation='sigmoid'),
                      Reshape((28,28))
])
autoencoder = Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs))
autoencoder.summary()
#Model: "model"
#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#flatten_2_input (InputLayer) [(None, 28, 28)]          0         
#_________________________________________________________________
#flatten_2 (Flatten)          (None, 784)               0         
#_________________________________________________________________
#dense_12 (Dense)             (None, 256)               200960    
#_________________________________________________________________
#dense_13 (Dense)             (None, 64)                16448     
#_________________________________________________________________
#dense_14 (Dense)             (None, 2)                 130       
#_________________________________________________________________
#sequential_5 (Sequential)    (None, 28, 28)            218320    
#=================================================================
#Total params: 435,858
#Trainable params: 435,858
#Non-trainable params: 0

Load MNIST data:

import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype('float32')/255.
x_test = x_test.astype('float32')/255.

Now let's train the model on the MNIST dataset;

autoencoder.compile(loss='binary_crossentropy')
autoencoder.fit(x=x_train, y=x_train, epochs=10, batch_size=32)
# Epoch 1/10
# 1875/1875 [==============================] - 8s 4ms/step - loss: 0.2311
# Epoch 2/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.2009
# Epoch 3/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1911
# Epoch 4/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1864
# Epoch 5/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1832
# Epoch 6/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1807
# Epoch 7/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1786
# Epoch 8/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1771
# Epoch 9/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1757
# Epoch 10/10
# 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1746

The following animation shows the reconstruction of a few randomly selected images by the autoencoder at different epochs, as we can see, the reconstruction error becomes less as the model is trained for more and more epochs:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.