0

I have two python np.arrays called data and labels. I want to randomly reduced their size. In order to do so, I am doing the following:

np.random.seed(0)
ind = np.random.randint(len(data), size=(50000,))
reduced_data = data[ind, :]
reduced_labels = labels[ind]

I randomly pick 50000 from both labels and data. How can I store the rest of the data, so have can i find the rest indexes from the initial arrays?

1 Answer 1

1

If you want to "randomly" reduce size, I would be very much against the use of a seed...

Apart from that, use boolean masking:

mask = np.ones(len(data), dtype=bool)
mask[ind] = False
reduced_data = data[~mask] #completely similar to data[ind]
rest_data = data[mask]

If you want to reduce the data by a set amount, I can think of the following:

ind = np.arange(len(data))
np.random.shuffle(ind)
ind = ind[:50000] #Or whatever the size is of what you want to reduce
Sign up to request clarification or add additional context in comments.

4 Comments

How can I determine the size of the mask?
By the way you specify ind it is already defined as 50000 in this case, right? That is not completely true, as ind contains repetitions ... if you want to reduce without repetitions..let me think for a bit..
You are; all I said is that you are making the result deterministic by doing so, and if you want to pseudo-randomly reduce, it is probably not the way to go ;)
As a matter in fact i have data from two classes and I want in the end to be sure that the reduced arrays have approximately the same amount of data from both classes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.