Input df(example)
Country SubregionA SubregionB
BRA State of Acre Brasiléia
BRA State of Acre Cruzeiro do Sul
USA AL Bibb County
USA AL Blount County
USA AL Bullock County
Output df
Country SubregionA SubregionB
BRA State of Acre State of Acre - Brasiléia
BRA State of Acre State of Acre - Cruzeiro do Sul
USA AL AL Bibb County
USA AL AL Blount County
USA AL AL Bullock County
The code snippet is quite self explanatory, but when executed seems to run forever. What could be going wrong(Also the dataframe 'data' is quite large around 250K+ rows)
for row in data.itertuples():
region = data['Country']
if region == 'ARG' :
data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'BRA' :
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'USA':
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
else:
pass
Explanation : Trying to join columns SubregionA and SubregionB based on values in the column name 'Country'. The separators are different and thus have written multiple if-else statements. Takes too long to execute, how can I make this faster?
-, which' '? Some regions are not processing?