Pandas modify column in different way than loop

Question

I have a df:

DF
name1   name2    finalName
AB123   BB123    0
BB113   AB113    0
AB343   AB343    0
CC263   BB263    0
ED633   DD633    0

I need to modify finalName in that way: if name1 starts with AB and name2 starts with BB - finalName should be BB+number so in first case: BB123.

if name1 starts with BB and name2 starts with AB - finalName should be AB+number so in second case: AB123.

In rest of examples finalName should stay 0.

I wrote this code:

for row in range(len(DF)):
    if(DF.name1.loc[row][0:2] == 'AB' and DF.name2.loc[row][0:2] == 'BB'):
         DF.finalName[row] = DF.name1[row].replace('AB','BB',1)
    if(DF.name1.loc[row][0:2] == 'BB' and DF.name2.loc[row][0:2] == 'AB'):
         DF.finalName[row] = DF.name1[row].replace('BB','AB',1)

And I got a Key error because I had a missing index (...69,70,72..). So I found info that I need to reindex my df. I done it, and it's work ok. But I also found an info that I shouldn't loop my DF. So my question is:

How can I do it in pandas way? I mean without loop?

PS. final df should looks that:

DF
name1   name2    finalName
AB123   BB123    BB123   
BB113   AB113    AB113
AB343   AB343    0
CC263   BB263    0
ED633   DD633    0

So necessary replace by another column? Not DF.finalName[row] = DF.name1[row].replace('AB','BB',1) but DF.finalName[row] = DF.name2[row] ? — jezrael
– jezrael, Commented Sep 6, 2019 at 10:51
If yes, please change question, because anky and Jaroslav answer replace by name2, and my answer by ['BB' + c, 'AB' + c] where c is values of name1 without 2 letters. Or someting missing? — jezrael
– jezrael, Commented Sep 6, 2019 at 10:56
@jezrael often is that like u said, but sometimes finalname has different number. So Anky answer is OK for me in that case — martin
– martin, Commented Sep 6, 2019 at 12:25
So it means name1 and name2 are always same per rows, only different first 2 letters? — jezrael
– jezrael, Commented Sep 6, 2019 at 12:28

anky · Accepted Answer · 2019-09-06 10:25:53Z

1

Here's one way using series.str.startswith():

c1=df.name1.str.startswith('AB')&df.name2.str.startswith('BB')
c2=df.name1.str.startswith('BB')&df.name2.str.startswith('AB')

df['finalName']=np.where(c1|c2,df.name2,df.finalName)
print(df)

   name1  name2 finalName
0  AB123  BB123     BB123
1  BB113  AB113     AB113
2  AB343  AB343         0
3  CC263  BB263         0
4  ED633  DD633         0

answered Sep 6, 2019 at 10:25

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anky Over a year ago

@jezrael ahh, okay i see what you mean. I assumed the numbers are same in both column Lets wait for OP comment then :)

Jaroslav Bezděk · Accepted Answer · 2019-09-06 10:28:58Z

1

You can use .apply() method like this:

def make_finalName(row):
    if row['name1'].startswith('AB') and row['name2'].startswith('BB'):
        return row['name2']
    if row['name1'].startswith('BB') and row['name2'].startswith('AB'):
        return row['name2']
    return row['finalName']

df['finalName'] = df.apply(lambda row: make_finalName(row), axis=1)

The output would be following:

>> print(df)
   name1  name2 finalName
0  AB123  BB123     BB123
1  BB113  AB113     AB113
2  AB343  AB343         0
3  CC263  BB263         0
4  ED633  DD633         0

answered Sep 6, 2019 at 10:28

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

Comments

jezrael · Accepted Answer · 2019-09-06 11:09:06Z

Instead replace is possible add BB or AB to values of Series c without first 2 letters with numpy.select:

a = DF.name1.str[:2] 
b = DF.name2.str[:2] 
c = DF.name1.str[2:] 
m1 = (a == 'AB') & (b == 'BB')
m2 = (a == 'BB') & (b == 'AB')

Or:

c = DF.name1.str[2:] 
m1 = DF.name1.str.startswith('AB') & DF.name2.str.startswith('BB')
m2 = DF.name1.str.startswith('BB') & DF.name2.str.startswith('AB')

DF['finalName'] = np.select([m1, m2], ['BB' + c, 'AB' + c], DF.finalName)
print (DF)
   name1  name2 finalName
0  AB123  BB123     BB123
1  BB113  AB113     AB113
2  AB343  AB343         0
3  CC263  BB263         0
4  ED633  DD633         0

Another solution:

DF['finalName'] = np.select([m1, m2], [DF.name1.str.replace('AB','BB',1),
                                       DF.name1.str.replace('BB','AB',1)], DF.finalName)
print (DF)
   name1  name2 finalName
0  AB123  BB123     BB123
1  BB113  AB113     AB113
2  AB343  AB343         0
3  CC263  BB263         0
4  ED633  DD633         0

Collectives™ on Stack Overflow

Pandas modify column in different way than loop

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related