1

I have a loop in pandas that is really slow (ten plus minutes). I am trying to replace it with a vectorized function, but can't think of what to use. There are multiple records that have different household numbers but the same relationship group number, and if a record's household number is the same as the relationship group number then I want to use the officer number and name for that record for all records with that relationship group number (including if household number is different). See code below:

        rg['RG Officer Number'] = pd.np.nan
        rg['RG Officer Name'] = pd.np.nan
        for index, row in rg.iterrows():
            if row['Relationship Group'] == row['Household Number']:
                mask = rg['Relationship Group'] == row['Relationship Group']
                rg.loc[mask, 'RG Officer Number'] = row['Household Primary Officer Number']
                rg.loc[mask, 'RG Officer Name'] = row['Household Primary Officer Name'] 

I tried the below, but I got an error (cannot use a single bool to index into setitem). I think I am completely off track. Maybe this is impossible with a vectorized function, but it seems it should not be.

        mask = row['Relationship Group'] == row['Household Number']
        rg.loc[mask, 'RG Officer Number'] = rg.loc['Household Primary Officer Number']

Any help you offer would be appreciated.

1
  • Could you provide us with a sample of data to work with? A few rows of your Dataframe should suffice Commented Oct 9, 2020 at 21:51

1 Answer 1

1

A filter and merge would work.

df = pd.DataFrame({'Household Number':[str(i) for i in range(10)],
                   'Relationship Number':[str(i) for i in range(5)]*2,
                   'RG Officer Number':np.random.randint(1,100,10),
                   'RG Officer Name':['name'+str(i) for i in np.random.randint(1,100,10)]})

df
#  Household Number Relationship Number  RG Officer Number RG Officer Name
#0                0                   0                 28          name87
#1                1                   1                 18          name71
#2                2                   2                 69           name8
#3                3                   3                 83          name64
#4                4                   4                 88          name36
#5                5                   0                 25          name89
#6                6                   1                 51          name76
#7                7                   2                 29          name80
#8                8                   3                 61          name27
#9                9                   4                  2          name95


df_filtered = df.loc[df['Household Number'] == df['Relationship Number']]
df_filtered
#  Household Number Relationship Number  RG Officer Number RG Officer Name
#0                0                   0                 28          name87
#1                1                   1                 18          name71
#2                2                   2                 69           name8
#3                3                   3                 83          name64
#4                4                   4                 88          name36

df_merged = pd.merge(left=df,right=df_filtered[['Relationship Number','RG Officer Number','RG Officer Name']],
                     how='left',
                     on='Relationship Number',suffixes=('_old','_new'))

Here's the merged data. df_merged

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this does the trick and only takes only a second to run. This is being scheduled to run daily along with some other scripts, so speed is very important.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.