Create tensor with arrays of different dimensions in PyTorch

Question

I want to concatenate arrays of different dimensions to feed them to my neural network that will have as first layer the AdaptiveAveragePooling1d. I have a dataset that is composed of several signals (1D arrays), each one with a different length. For example:

array1 = np.random.randn(1200,1)
array2 = np.random.randn(950,1)
array3 = np.random.randn(1000,1)

I want to obtain a tensor in which I concatenate these three signals to obtain a 2D tensor. However if I try to do

tensor = torch.Tensor([array1, array2, array3])

It gives me this error:

ValueError: expected sequence of length 1200 at dim 2 (got 950)

Is there a way to obtain such thing?

EDIT More information about the dataset:

Each signal window represents a heart beat on the ECG registration, taken from several patients, sampled with a sampling frequency of 1000Hz
The beats can have different lengths, because it depends on the heart rate of the patient itself
For each beat I need to predict the length of the QRS interval (the target of the network) that I have, expressed in milliseconds
I have already thought of interpolating the shortest samples to the the length of the longest ones, but then I would also have to change the length of the QRS interval in the labels, is that right?

I have read of this AdaptiveAveragePooling1d layer, that would allow me to input the network with samples of different sizes. But my problem is how do I input the network a dataset in which each sample has a different length? How do I group them without using a filling method with NaNs or zeros? I hope I explained myself.

Hersh Joshi · Accepted Answer · 2021-09-23 14:39:14Z

1

This disobeys the definition of a tensor and is impossible. If a tensor is of shape (NxMx1), all of the N matrices must be of size (Mx1).

There are still ways to get all your arrays to the same length. Look at where your data is coming from and what its structure is and figure out which of the following solutions would work. Some of these may change the signal's derivative in a way you don't like

Cropping arrays to the same size (ie cutting start/end off) or zero padding the shorter ones to the length of the longer one (I really dislike this one and it would only work for very specific applications)
'Stretching' the arrays to the same size by using interpolation
Shortening the arrays to the same size by subsampling
For some applications, maybe even passing the coefficients of a fourier series from the signals

EDIT For heart rate, which should be a roughly periodic signal, I'd definitely crop the signal which should work quite well. Passing FFT(equally cropped signals) or Fourier coefficients may also yield interesting results, but from my experience with neural spike data, training on the FFT of a signal like this doesn't perform any better when you have enough data to train off.

Also if you're using a fully connected network, a using 1D convolutions is a good alternative to try.

edited Sep 23, 2021 at 14:39

answered Sep 23, 2021 at 13:26

Hersh Joshi

5425 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hersh Joshi Over a year ago

Could you edit your question to indicate what the data represents? That makes it a lot easier to answer.

Sara De Luca Over a year ago

I edited the question with all the information you required. I hope it is useful, thank you for your help.

Collectives™ on Stack Overflow

Create tensor with arrays of different dimensions in PyTorch

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related