Replacing values in DataFrame column based on values in another column

Question

To try, I have:

test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]

I would like to replace parts of values in the last column with 0 only if they exist in the replace list at position corresponding to the number in the first column for that row.

For example, looking at the first three rows:

So, since 'r' is in replace[1], that cell becomes A B 0. 'A' is not in replace[0], so it stays as A A A, 'a' and 'c' are both in replace[2], so it becomes B 0 0, etc.

I tried something like

test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))

but it's not changing anything.

This is unclear to me...

BENY
– BENY

2019-01-20 19:08:51 +00:00
Commented Jan 20, 2019 at 19:08 — BENY
– BENY, Commented Jan 20, 2019 at 19:08

rafaelc · Accepted Answer · 2019-01-20 19:15:58Z

3

IIUC, use zip and a list comprehension to accomplish this.

I've simplified and created a custom replace_ function, but feel free to use regex to perform the replacement if needed.

def replace_(st, reps):
    for old,new in reps:
        st = st.replace(old,new)
    return st

df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]

Outputs

    0   1   2   3       new
0   1   A   B   A B r   A B 0
1   0   A   B   A A A   A A A
2   2   B   C   B a c   B 0 0
3   1   A   B   s A B   0 A B
4   1   A   B   A       A
5   0   B   C   x       0

answered Jan 20, 2019 at 19:15

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-01-20 19:25:50Z

2

Use list comprehension with lookup in sets:

test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split()) 
                     for a,b in zip(test[0], test[3])]
print (test)
   0  1  2      3
0  1  A  B  A B 0
1  0  A  B  A A A
2  2  B  C  B 0 0
3  1  A  B  0 A B
4  1  A  B      A
5  0  B  C      0

Or convert to sets before for improve performance:

r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]

edited Jan 20, 2019 at 19:25

answered Jan 20, 2019 at 19:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

BENY · Accepted Answer · 2019-01-20 19:40:58Z

2

Finally I know what you need

s=pd.Series(replace).reindex(test[0])

[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']

answered Jan 20, 2019 at 19:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Replacing values in DataFrame column based on values in another column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related