0

I have been following the tutorial for feature extraction using pytorch audio here: https://pytorch.org/audio/0.10.0/pipelines.html#wav2vec-2-0-hubert-representation-learning

It says the result is a list of tensors of lenth 12 where each entry is the output of a transformer layer. So, the first tensor on the list has shape of something like (1,2341,768).

It seems to be correct as I get this result for most audios.

However, for some videos, I get returned a tensor of length 12, but the entries have more than 1 batchsize bizzarely. So the shape is (2,2341,768) I am baffled as to why this is?

Any clues would be great.

1 Answer 1

1

This is likely to be coming from your incoming audio being multi-channel (stereo for example). You can check the shape of your input tensor to see if the input is "batched" too, since it would be of shape (2, L) with L being the length of the audio. Then each layer of the model gives you a representation of shape (2, L', D), L' being the length of output sequence and D the number of features of the model.

Sign up to request clarification or add additional context in comments.

2 Comments

oh thanks for this awesome insight. How would you deal with this? atm I have a script running where I compute the mean of the two batches.. or should I just be taking one or the other channel/batch?
I think it would depend on what you want to do and what audio you have exactly. For example, if the difference between both channels is small, you can just pick the first one and ignore the other. If you want to use both channels but with only one representation, you can merge the two channels into a single one with sox or average the representations in the end. If both channels are very different, it could be interesting to keep both depending on what you are doing with these representations

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.