Efficient way to remove sections of Numpy array

Question

I am working with a numpy array of features in the following format

[[feat1_channel1,feat2_channel1...feat6_channel1,feat1_channel2,feat2_channel2...]] (so each channel has 6 features and the array shape is 1 x (number channels*features_per_channel) or 1 x total_features)

I am trying to remove specified channels from the feature array, ex: removing channel 1 would mean removing features 1-6 associated with channel 1.

my current method is shown below:

reshaped_features = current_feature.reshape((-1,num_feats))
desired_channels = np.delete(reshaped_features,excluded_channels,axis=0)
current_feature = desired_channels.reshape((1,-1))

where I reshape the array to be number_of_channels x number_of_features, remove the rows corresponding to the channels I want to exclude, and then reshape the array with the desired variables into the original format of being 1 x total_features.

The problem with this method is that it tremendously slows down my code because this process is done 1000s of times so I was wondering if there were any suggestions on how to speed this up or alternative approaches?

As an example, given the following array of features:

[[0,1,2,3,4,5,6,7,8,9,10,11...48,49,50,51,52,53]]

i reshape to below:

[[0,1,2,3,4,5],
 [6,7,8,9,10,11],
 [12,13,14,15,16,17],
 .
 .
 .
 [48,49,50,51,52,53]]

and, as an example, if I want to remove the first two channels then the resulting output should be:

    [[12,13,14,15,16,17],
     .
     .
     .
     [48,49,50,51,52,53]]

and finally:

[[12,13,14,15,16,17...48,49,50,51,52,53]]

Are you always excluding the same channels? You could create one numpy array and then write subsets of your arrays (excluding some channels) to that numpy array. — jkr
– jkr, Commented Oct 21, 2020 at 22:57
It would be helpful to show how the data is going to be used. There might be better ways to solve this problem than deleting data from the input. — GZ0
– GZ0, Commented Oct 22, 2020 at 9:26
The same channels will not always be excluded. The 'excluded channels' list will be specified by the user. The example I showed above is the feature data from one frame of data and there are thousands of frames of data. So each frame is appended to a list and the list is then used as the training data for a classifier model. I figured removing the channels before appending to the list would be best because slicing the data would just get more complicated as more is added. — Kunal Shah
– Kunal Shah, Commented Oct 22, 2020 at 14:27

Kunal Shah · Accepted Answer · 2021-03-11 18:08:57Z

1

I found a solution that did not use np.delete() which was the main culprit of the slowdown, building off the answer from msi_gerva.

I found the channels I wanted to keep using list comp

all_chans = [1,2,3,4,5,6,7,8,9,10]

features_per_channel = 5 
my_data = np.arange(len(all_chans)*features_per_channel)

chan_to_exclude = [1,3,5]

channels_to_keep = [i for i in range(len(all_chans)) if i not in chan_to_exclude]

Then reshaped the array

reshaped =  my_data.reshape((-1,features_per_channel))

Then selected the channels I wanted to keep

desired_data = reshaped[channels_to_keep]

And finally reshaped to the desired shape

final_data = desired_data.reshape((1,-1))

These changes made the code ~2x faster than the original method.

answered Mar 11, 2021 at 18:08

Kunal Shah

6347 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

msi_gerva · Accepted Answer · 2020-10-23 08:15:29Z

0

With the numerical examples, you provided, I would go with:

import numpy as np
arrays = [ii for ii in range(0,54)];
arrays = np.reshape(arrays,(int(54/6),6));
newarrays = arrays.copy();

remove = [1,3,5];
take = [0,2,4,6,7,8];

arrays = np.delete(arrays,remove,axis=0);
newarrays = newarrays[take];

arrays = list(arrays.flatten());
newarrays = list(newarrays.flatten());

edited Oct 23, 2020 at 8:15

answered Oct 22, 2020 at 7:01

msi_gerva

2,0803 gold badges22 silver badges31 bronze badges

4 Comments

Kunal Shah Over a year ago

This would work for the examples I provided, but what if I wanted to remove the 1st, 4th, 5th, and 7th channels? How could I generally write this so the channels could be specified?

msi_gerva Over a year ago

I will edit the example based on your wish! I added additional list with rows to be removed and with np.delete we remove rows 1,3 and 5.

Kunal Shah Over a year ago

I think your solution is very close to the solution I outlined in my question. The only difference is using flatten at the end rather than reshape, but the use of np.delete is the section that is taking up the most time. np.delete takes ~100x the time as the reshape/flatten

msi_gerva Over a year ago

If delete takes too much time, you could instead of removing elements just take elements from original data using a list of arrays/channels to keep... It's up to you. I can imagine that working with real data takes much more time and effort than this simple tests.

Collectives™ on Stack Overflow

Efficient way to remove sections of Numpy array

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related