1

I am working with a numpy array of features in the following format

[[feat1_channel1,feat2_channel1...feat6_channel1,feat1_channel2,feat2_channel2...]] (so each channel has 6 features and the array shape is 1 x (number channels*features_per_channel) or 1 x total_features)

I am trying to remove specified channels from the feature array, ex: removing channel 1 would mean removing features 1-6 associated with channel 1.

my current method is shown below:

reshaped_features = current_feature.reshape((-1,num_feats))
desired_channels = np.delete(reshaped_features,excluded_channels,axis=0)
current_feature = desired_channels.reshape((1,-1))

where I reshape the array to be number_of_channels x number_of_features, remove the rows corresponding to the channels I want to exclude, and then reshape the array with the desired variables into the original format of being 1 x total_features.

The problem with this method is that it tremendously slows down my code because this process is done 1000s of times so I was wondering if there were any suggestions on how to speed this up or alternative approaches?

As an example, given the following array of features:

[[0,1,2,3,4,5,6,7,8,9,10,11...48,49,50,51,52,53]]

i reshape to below:

[[0,1,2,3,4,5],
 [6,7,8,9,10,11],
 [12,13,14,15,16,17],
 .
 .
 .
 [48,49,50,51,52,53]]

and, as an example, if I want to remove the first two channels then the resulting output should be:

    [[12,13,14,15,16,17],
     .
     .
     .
     [48,49,50,51,52,53]]

and finally:

[[12,13,14,15,16,17...48,49,50,51,52,53]]
4
  • Are you always excluding the same channels? You could create one numpy array and then write subsets of your arrays (excluding some channels) to that numpy array. Commented Oct 21, 2020 at 22:57
  • It would be helpful to show how the data is going to be used. There might be better ways to solve this problem than deleting data from the input. Commented Oct 22, 2020 at 9:26
  • Meanwhile, how large is the data? Commented Oct 22, 2020 at 9:28
  • The same channels will not always be excluded. The 'excluded channels' list will be specified by the user. The example I showed above is the feature data from one frame of data and there are thousands of frames of data. So each frame is appended to a list and the list is then used as the training data for a classifier model. I figured removing the channels before appending to the list would be best because slicing the data would just get more complicated as more is added. Commented Oct 22, 2020 at 14:27

2 Answers 2

1

I found a solution that did not use np.delete() which was the main culprit of the slowdown, building off the answer from msi_gerva.

I found the channels I wanted to keep using list comp

all_chans = [1,2,3,4,5,6,7,8,9,10]

features_per_channel = 5 
my_data = np.arange(len(all_chans)*features_per_channel)

chan_to_exclude = [1,3,5]

channels_to_keep = [i for i in range(len(all_chans)) if i not in chan_to_exclude]

Then reshaped the array

reshaped =  my_data.reshape((-1,features_per_channel))

Then selected the channels I wanted to keep

desired_data = reshaped[channels_to_keep]

And finally reshaped to the desired shape

final_data = desired_data.reshape((1,-1))

These changes made the code ~2x faster than the original method.

Sign up to request clarification or add additional context in comments.

Comments

0

With the numerical examples, you provided, I would go with:

import numpy as np
arrays = [ii for ii in range(0,54)];
arrays = np.reshape(arrays,(int(54/6),6));
newarrays = arrays.copy();

remove = [1,3,5];
take = [0,2,4,6,7,8];

arrays = np.delete(arrays,remove,axis=0);
newarrays = newarrays[take];

arrays = list(arrays.flatten());
newarrays = list(newarrays.flatten());

4 Comments

This would work for the examples I provided, but what if I wanted to remove the 1st, 4th, 5th, and 7th channels? How could I generally write this so the channels could be specified?
I will edit the example based on your wish! I added additional list with rows to be removed and with np.delete we remove rows 1,3 and 5.
I think your solution is very close to the solution I outlined in my question. The only difference is using flatten at the end rather than reshape, but the use of np.delete is the section that is taking up the most time. np.delete takes ~100x the time as the reshape/flatten
If delete takes too much time, you could instead of removing elements just take elements from original data using a list of arrays/channels to keep... It's up to you. I can imagine that working with real data takes much more time and effort than this simple tests.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.