numpy returns 1d array and 2d array for same code

Question

I am not really aware of what rules does numpy follows when performing some 2d array operations with regards to returning the result as a 1d or 2d array. Let us consider the following piece of code

idx_cls_samples = sample_data[:, -1] == c
v_feature = sample_data[idx_cls_samples, f]

f_values = sample_data[[sample_data[:, -1] == c], f]

Note that the last line is simply the first two lines combined into one.

The result of first two lines is a numpy vector of the form array([1, 2, 3, ...]) and the result of last line is array([[1, 2, 3, ...]]) and I believe the result should have been array([1], [2], [3], ...]) in both cases. How can I figure out beforehand what format will numpy choose to return the result?

the last line is not quite the same, sample_data[sample_data[:, -1] == c, f] would be the same (dropped an extra set of brackets) — Tadhg McDonald-Jensen
– Tadhg McDonald-Jensen, Commented Mar 31, 2016 at 4:38

user2357112 · Accepted Answer · 2016-03-31 04:38:51Z

2

Note that the last line is simply the first two lines combined into one.

No it's not. You stuck an extra pair of brackets in there:

f_values = sample_data[[sample_data[:, -1] == c], f]
#                      ^                       ^

Take them out.

As for the indexing rules, those are in the documentation. They're pretty long.

answered Mar 31, 2016 at 4:38

user2357112

286k32 gold badges490 silver badges569 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2016-03-31 06:54:45Z

sample_data is 2d. sample_data[:,-1] is 1d, the last column. Indexing with a scalar removes a dimension.

The ...=c produces a boolean of the same dimension (1d).

sample_data[:, f] is also a 1d, the fth column.

Indexing that with a boolean array returns a result of the same dimension of the boolean, but just a subset of the values

sample_data[idx, f] is 1d, sample_data[[idx], f] is 2d (due to the added []).

You probably wanted, sample_data[(sample_data[:, -1] == c), f], where () just groups the strings, sometimes for operator precedence, sometimes just to make more readable. (but beware of (...,), which makes a tuple).

sample_data[idx, [f]] would have given you the column 'vector', 2d with 1 column.

Another way to look at sample_data[idx,f] is: idx selects a subset of rows, f selects a column from that 2d.

Often 2d (or higher nd) indexing can be studied axis by axis; that's especially true with an index is scalar, or a slice. It's more complicated if an index is a list or array, or worse, 2 or more of those.

Collectives™ on Stack Overflow

numpy returns 1d array and 2d array for same code

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related