4

The third column in my numpy array is Age. In this column about 75% of the entries are valid and 25% are blank. Column 2 is Gender and using some manipulation I have calculated the average age of the men in my dataset to be 30. The average age of women in my dataset is 28.

I want to replace all blank Age values for men to be 30 and all blank age values for women to be 28.

However I can't seem to do this. Anyone have a suggestion or know what I am doing wrong?

Here is my code:

# my entire data set is stored in a numpy array defined as x

ismale = x[::,1]=='male'
maleAgeBlank = x[ismale][::,2]==''
x[ismale][maleAgeBlank][::,2] = 30 

For whatever reason when I'm done with the above code, I type x to display the data set and the blanks still exist even though I set them to 30. Note that I cannot do x[maleAgeBlank] because that list will include some female data points since the female data points are not yet excluded.

Is there any way to get what I want? For some reason, if I do x[ismale][::,1] = 1 (setting the column with 'male' equal to 1), that works, but x[ismale][maleAgeBlank][::,2] = 30 does not work.

sample of array:

#output from typing x
array([['3', '1', '22', ..., '0', '7.25', '2'],
   ['1', '0', '38', ..., '0', '71.2833', '0'],
   ['3', '0', '26', ..., '0', '7.925', '2'],
   ..., 
   ['3', '0', '', ..., '2', '23.45', '2'],
   ['1', '1', '26', ..., '0', '30', '0'],
   ['3', '1', '32', ..., '0', '7.75', '1']], 
  dtype='<U82')

#output from typing x[0]

array(['3', '1', '22', '1', '0', '7.25', '2'], 
  dtype='<U82')

Note that I have changed column 2 to be 0 for female and 1 for male already in the above output

1
  • can you post a sample of the array? Commented Nov 10, 2013 at 0:40

3 Answers 3

3

How about this:

my_data =  np.array([['3', '1', '22', '0', '7.25', '2'],
                     ['1', '0', '38', '0', '71.2833', '0'],
                     ['3', '0', '26', '0', '7.925', '2'],
                     ['3', '0', '', '2', '23.45', '2'],
                     ['1', '1', '26', '0', '30', '0'],
                     ['3', '1', '32', '0', '7.75', '1']], 
                     dtype='<U82')

ismale = my_data[:,1] == '0'
missing_age = my_data[:, 2] == ''
maleAgeBlank = missing_age & ismale
my_data[maleAgeBlank, 2] = '30'

Result:

>>> my_data
array([[u'3', u'1', u'22', u'0', u'7.25', u'2'],
       [u'1', u'0', u'38', u'0', u'71.2833', u'0'],
       [u'3', u'0', u'26', u'0', u'7.925', u'2'],
       [u'3', u'0', u'30', u'2', u'23.45', u'2'], 
       [u'1', u'1', u'26', u'0', u'30', u'0'],
       [u'3', u'1', u'32', u'0', u'7.75', u'1']], 
      dtype='<U82')
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! Thank you, very clean and understandable. Didn't even think of the & operation.
2

You can use the where function:

arr = array([['3', '1', '22', '1', '0', '7.25', '2'], 
            ['3', '', '22', '1', '0', '7.25', '2']], 
           dtype='<U82')

blank = np.where(arr=='')

arr[blank] = 20

array([[u'3', u'1', u'22', u'1', u'0', u'7.25', u'2'],
       [u'3', u'20', u'22', u'1', u'0', u'7.25', u'2']], 
      dtype='<U82')

If you want to change a specific column you can do the do the following:

male = np.where(arr[:, 1]=='') # where 1 is the column
arr[male] = 30

female = np.where(arr[:, 2]=='') # where 2 is the column
arr[female] = 28

2 Comments

where is efficient, but the current solution doesn't check the row's gender value and changes all blanks, not just those in the age column.
Doesn't he want to change the blank values of age to the average? The ages columns are only 1 and 2 for male and femalte. SO he needs 2 where for both columns only.
0

You could try iterating through the array in a simpler way. It's not the most efficient solution, but it should get the job done.

for row in range(len(x)):
    if row[2] == '':
        if row[1] == 1:
            row[2] == 30
        else:
            row[2] == 28

3 Comments

using a for loop with a numpy array is called nonsense. You loose the advantages of numpy by iterating.
@void That's fair. I'm not saying there aren't better solutions. But if all the OP cares about is getting this particular task solved quickly, hopefully this will help.
Using where is more efficient. Check my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.