replace blanks in numpy array

Question

The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.

I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.

However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?

Here is my code:

# my entire data set is stored in a numpy array defined as x

ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30

For whatever reason when I'm done with the above code, I type x to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank] because that list will include some female data points since the female data points are not yet excluded.

Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1 (setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30 does not work.

sample of array:

#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
   ['1', '0', '38', ..., '0', '71.2833', '0'],
   ['3', '0', '26', ..., '0', '7.925', '2'],
   ..., 
   ['3', '0', '', ..., '2', '23.45', '2'],
   ['1', '1', '26', ..., '0', '30', '0'],
   ['3', '1', '32', ..., '0', '7.75', '1']], 
  dtype='<U82')

#output from typing x[0]

array(['3', '1', '22', '1', '0', '7.25', '2'], 
  dtype='<U82')

Note that I have changed column 2 to be 0 for female and 1 for male already in the above output

can you post a sample of the array?

user1301404
– user1301404

2013-11-10 00:40:37 +00:00
Commented Nov 10, 2013 at 0:40 — user1301404
– user1301404, Commented Nov 10, 2013 at 0:40

Akavall · Accepted Answer · 2013-11-10 01:04:30Z

3

How about this:

my_data =  np.array([['3', '1', '22', '0', '7.25', '2'],
                     ['1', '0', '38', '0', '71.2833', '0'],
                     ['3', '0', '26', '0', '7.925', '2'],
                     ['3', '0', '', '2', '23.45', '2'],
                     ['1', '1', '26', '0', '30', '0'],
                     ['3', '1', '32', '0', '7.75', '1']], 
                     dtype='<U82')

ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

Result:

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
       [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
       [u'3', u'0', u'26', u'0', u'7.925', u'2'],
       [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
       [u'1', u'1', u'26', u'0', u'30', u'0'],
       [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
      dtype='<U82')

answered Nov 10, 2013 at 1:04

Akavall

86.8k58 gold badges214 silver badges260 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Terence Chow Over a year ago

Perfect! Thank you, very clean and understandable. Didn't even think of the & operation.

score 2 · Accepted Answer · 2013-11-10 01:09:09Z

2

You can use the where function:

arr = array([['3', '1', '22', '1', '0', '7.25', '2'], 
            ['3', '', '22', '1', '0', '7.25', '2']], 
           dtype='<U82')

blank = np.where(arr=='')

arr[blank] = 20

array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
       [u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']], 
      dtype='<U82')

If you want to change a specific column you can do the do the following:

male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30

female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28

edited Nov 10, 2013 at 1:09

answered Nov 10, 2013 at 0:52

user1301404

2 Comments

ASGM Over a year ago

where is efficient, but the current solution doesn't check the row's gender value and changes all blanks, not just those in the age column.

user1301404 Over a year ago

Doesn't he want to change the blank values of age to the average? The ages columns are only 1 and 2 for male and femalte. SO he needs 2 where for both columns only.

ASGM · Accepted Answer · 2013-11-10 00:53:55Z

0

You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.

for row in range(len(x)):
    if row[2] == '':
        if row[1] == 1:
            row[2] == 30
        else:
            row[2] == 28

answered Nov 10, 2013 at 0:53

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

3 Comments

user1301404 Over a year ago

using a for loop with a numpy array is called nonsense. You loose the advantages of numpy by iterating.

ASGM Over a year ago

@void That's fair. I'm not saying there aren't better solutions. But if all the OP cares about is getting this particular task solved quickly, hopefully this will help.

user1301404 Over a year ago

Using where is more efficient. Check my answer.

Collectives™ on Stack Overflow

replace blanks in numpy array

3 Answers 3

1 Comment

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related