0

Input df(example)

Country     SubregionA      SubregionB
BRA         State of Acre   Brasiléia
BRA         State of Acre   Cruzeiro do Sul
USA         AL              Bibb County
USA         AL              Blount County
USA         AL              Bullock County

Output df

Country     SubregionA      SubregionB
BRA         State of Acre   State of Acre - Brasiléia
BRA         State of Acre   State of Acre - Cruzeiro do Sul
USA         AL              AL Bibb County
USA         AL              AL Blount County
USA         AL              AL Bullock County

The code snippet is quite self explanatory, but when executed seems to run forever. What could be going wrong(Also the dataframe 'data' is quite large around 250K+ rows)

for row in data.itertuples():
     region = data['Country']

     if region == 'ARG' :
          data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'BRA' :
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
     elif region == 'USA':
          data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
     else:
          pass

Explanation : Trying to join columns SubregionA and SubregionB based on values in the column name 'Country'. The separators are different and thus have written multiple if-else statements. Takes too long to execute, how can I make this faster?

5
  • How many different separators? Commented Oct 30, 2020 at 10:01
  • 1
    Can you add some data sample, minimal, complete, and verifiable example ? Commented Oct 30, 2020 at 10:01
  • @jezrael Actually, it's just two separator for now '-'(Hyphen) and ' '(whitespace) Commented Oct 30, 2020 at 10:09
  • OK, is possible specify which region has separator -, which ' ' ? Some regions are not processing? Commented Oct 30, 2020 at 10:10
  • @jezrael Added an example, let me know if you need more info. Separators only need to be specified for some of the regions, no transformations should be done on the rest of the regions. Transformation needed only for 'ARG', 'BRA', 'USA' Commented Oct 30, 2020 at 10:17

1 Answer 1

1

You can use numpy.select with Series.isin and join columns with +:

print (df)
  Country     SubregionA       SubregionB
0     BRA  State of Acre         Brasilia
1     BRA  State of Acre  Cruzeiro do Sul
2     USA             AL      Bibb County
3     USA             AL    Blount County
4     USA             AL   Bullock County
5     JAP            AAA             BBBB

reg1 = ['ARG','BRA']
reg2 = ['USA']

a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)], 
              [df['SubregionA'] + ' - ' + df['SubregionB'],
               df['SubregionA'] + ' ' + df['SubregionB']],
              default=df['SubregionB'])

df['SubregionB'] = a
print (df)
  Country     SubregionA                       SubregionB
0     BRA  State of Acre         State of Acre - Brasilia
1     BRA  State of Acre  State of Acre - Cruzeiro do Sul
2     USA             AL                   AL Bibb County
3     USA             AL                 AL Blount County
4     USA             AL                AL Bullock County
5     JAP            AAA                             BBBB
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.