1

I am not really aware of what rules does numpy follows when performing some 2d array operations with regards to returning the result as a 1d or 2d array. Let us consider the following piece of code

idx_cls_samples = sample_data[:, -1] == c
v_feature = sample_data[idx_cls_samples, f]

f_values = sample_data[[sample_data[:, -1] == c], f]

Note that the last line is simply the first two lines combined into one.

The result of first two lines is a numpy vector of the form array([1, 2, 3, ...]) and the result of last line is array([[1, 2, 3, ...]]) and I believe the result should have been array([1], [2], [3], ...]) in both cases. How can I figure out beforehand what format will numpy choose to return the result?

2
  • the last line is not quite the same, sample_data[sample_data[:, -1] == c, f] would be the same (dropped an extra set of brackets) Commented Mar 31, 2016 at 4:38
  • Thanks for pointing that out. Commented Mar 31, 2016 at 4:48

2 Answers 2

2

Note that the last line is simply the first two lines combined into one.

No it's not. You stuck an extra pair of brackets in there:

f_values = sample_data[[sample_data[:, -1] == c], f]
#                      ^                       ^

Take them out.

As for the indexing rules, those are in the documentation. They're pretty long.

Sign up to request clarification or add additional context in comments.

Comments

0

sample_data is 2d. sample_data[:,-1] is 1d, the last column. Indexing with a scalar removes a dimension.

The ...=c produces a boolean of the same dimension (1d).

sample_data[:, f] is also a 1d, the fth column.

Indexing that with a boolean array returns a result of the same dimension of the boolean, but just a subset of the values

sample_data[idx, f] is 1d, sample_data[[idx], f] is 2d (due to the added []).

You probably wanted, sample_data[(sample_data[:, -1] == c), f], where () just groups the strings, sometimes for operator precedence, sometimes just to make more readable. (but beware of (...,), which makes a tuple).

sample_data[idx, [f]] would have given you the column 'vector', 2d with 1 column.

Another way to look at sample_data[idx,f] is: idx selects a subset of rows, f selects a column from that 2d.

Often 2d (or higher nd) indexing can be studied axis by axis; that's especially true with an index is scalar, or a slice. It's more complicated if an index is a list or array, or worse, 2 or more of those.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.