0

I'm using np.concatenate to concatenate a non-sequential column with some sequential columns in a large dataset, and I realized my method would look rather cumbersome if I wanted to do this with multiple non-sequential columns. Would I just chain concatenate all of the individual columns? I'm looking for a broad answer, not a solution for say columns 2, 5 and 7.

import numpy as np
rand_data = np.random.rand(156,26)
new_array = np.concatenate((rand_data[:,22].reshape(-1,1),rand_data[:, 24:27]), axis = 1)
2
  • 1
    Not much data to work with, but generally in this type of situations people make simple wrapper-functions which take parameters (either individual numbers for the columns or a list/array of numbers) and then concatenate the columns and return the new array. I am assuming you just want to save some manual typing here? Commented Feb 28, 2018 at 19:14
  • Yes, that's right. That's a good way to do it, thanks. I wasn't sure if there was a faster way like new_array = rand_data[:, (1,4,6)] Commented Feb 28, 2018 at 19:30

1 Answer 1

1

An alternative to indexing and then concatenating, is to concatenate indices first.

np.r_ is a handy of doing this (though not the fastest):

In [40]: np.r_[22,24:27]
Out[40]: array([22, 24, 25, 26])

Testing with your array:

In [29]: rand_data = np.random.rand(156,26)

In [31]: new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27]), axis = 1)
In [32]: new_array.shape
Out[32]: (156, 3)

With r_:

In [41]: arr = rand_data[:,np.r_[22,24:27]]
....
IndexError: index 26 is out of bounds for axis 1 with size 26

oops, with advanced indexing out of bounds values are not allowed (in contrast to slice indexing)

In [42]: arr = rand_data[:,np.r_[22,24:26]]
In [43]: arr.shape
Out[43]: (156, 3)

Compare the times:

In [44]: timeit new_array = np.concatenate((rand_data[:,[22]],rand_data[:, 24:27
    ...: ]), axis = 1)
15 µs ± 20.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [45]: timeit arr = rand_data[:,np.r_[22,24:26]]
29.7 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The r_ approach is more compact, but actually a bit slower.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.