2

To try, I have:

test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]

I would like to replace parts of values in the last column with 0 only if they exist in the replace list at position corresponding to the number in the first column for that row.

For example, looking at the first three rows:

enter image description here

So, since 'r' is in replace[1], that cell becomes A B 0. 'A' is not in replace[0], so it stays as A A A, 'a' and 'c' are both in replace[2], so it becomes B 0 0, etc.

I tried something like

test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))

but it's not changing anything.

1
  • 2
    This is unclear to me... Commented Jan 20, 2019 at 19:08

3 Answers 3

3

IIUC, use zip and a list comprehension to accomplish this.

I've simplified and created a custom replace_ function, but feel free to use regex to perform the replacement if needed.

def replace_(st, reps):
    for old,new in reps:
        st = st.replace(old,new)
    return st

df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]

Outputs

    0   1   2   3       new
0   1   A   B   A B r   A B 0
1   0   A   B   A A A   A A A
2   2   B   C   B a c   B 0 0
3   1   A   B   s A B   0 A B
4   1   A   B   A       A
5   0   B   C   x       0
Sign up to request clarification or add additional context in comments.

Comments

2

Use list comprehension with lookup in sets:

test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split()) 
                     for a,b in zip(test[0], test[3])]
print (test)
   0  1  2      3
0  1  A  B  A B 0
1  0  A  B  A A A
2  2  B  C  B 0 0
3  1  A  B  0 A B
4  1  A  B      A
5  0  B  C      0

Or convert to sets before for improve performance:

r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]

Comments

2

Finally I know what you need

s=pd.Series(replace).reindex(test[0])

[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.