I want to build a Sequence-to-sequence autoencoder in keras. The purpose is to "doc2vec".
In the documents on keras blog, I found an example: https://blog.keras.io/building-autoencoders-in-keras.html
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
What if I need to add an embedding layer to this? If we are dealing with a paragraph of text, we suppose should firstly tokenize the text, embedding it with pre-trained vector, right?
Do I need a Dense or time distributed dense layer in decoder? Do I need to reverse the order of the sequence?
Thanks in advance.