Efficient method of concatenating non-sequential columns in 2d numpy array

Question

I'm using np.concatenate to concatenate a non-sequential column with some sequential columns in a large dataset, and I realized my method would look rather cumbersome if I wanted to do this with multiple non-sequential columns. Would I just chain concatenate all of the individual columns? I'm looking for a broad answer, not a solution for say columns 2, 5 and 7.

import numpy as np
rand_data = np.random.rand(156,26)
new_array = np.concatenate((rand_data[:,22].reshape(-1,1),rand_data[:, 24:27]), axis = 1)

Not much data to work with, but generally in this type of situations people make simple wrapper-functions which take parameters (either individual numbers for the columns or a list/array of numbers) and then concatenate the columns and return the new array. I am assuming you just want to save some manual typing here? — Stacking For Heap
– Stacking For Heap, Commented Feb 28, 2018 at 19:14
Yes, that's right. That's a good way to do it, thanks. I wasn't sure if there was a faster way like new_array = rand_data[:, (1,4,6)] — dward4
– dward4, Commented Feb 28, 2018 at 19:30

hpaulj · Accepted Answer · 2018-02-28 19:30:43Z

An alternative to indexing and then concatenating, is to concatenate indices first.

np.r_ is a handy of doing this (though not the fastest):

In [40]: np.r_[22,24:27]
Out[40]: array([22, 24, 25, 26])

Testing with your array:

In [29]: rand_data = np.random.rand(156,26)

In [31]: new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27]), axis = 1)
In [32]: new_array.shape
Out[32]: (156, 3)

With r_:

In [41]: arr = rand_data[:,np.r_[22,24:27]]
....
IndexError: index 26 is out of bounds for axis 1 with size 26

oops, with advanced indexing out of bounds values are not allowed (in contrast to slice indexing)

In [42]: arr = rand_data[:,np.r_[22,24:26]]
In [43]: arr.shape
Out[43]: (156, 3)

Compare the times:

In [44]: timeit new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27
    ...: ]), axis = 1)
15 µs ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [45]: timeit arr = rand_data[:,np.r_[22,24:26]]
29.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The r_ approach is more compact, but actually a bit slower.

Collectives™ on Stack Overflow

Efficient method of concatenating non-sequential columns in 2d numpy array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related