1

dfF:

    Sample  AlmostFinal  
    1          KOPLA234        
    1          KOPLA234
    2          RWPLB253
    3          MMPLA415
    3          MMPLA415 

I need to replace KOPL and RWP and MM to KOLPOL and last char a/b should stay. So result shoud be:

    Sample  AlmostFinal  Final
    1          KOPLA234  KOLPOLA234      
    1          KOPLA234  KOLPOLA234
    2          RWPLB253  KOLPOLB253
    3          MMPLA415  KOLPOLA415
    3          MMPLA415  KOLPOLA415

I tried to do it by replace:

    dfF['Final'] = (dfF['AlmostFinal'].replace({'KOPL':'KOLPOL'}, regex = True))
    dfF['Final'] = (dfF['AlmostFinal'].replace({'RWP':'KOLPOL'}, regex = True))
    dfF['Final'] = (dfF['AlmostFinal'].replace({'MMPL':'KOLPOL'}, regex = True))

And: If i comment 2th and 3th line replaces for KOPL works.

When I comment 1st and 3th replace for RWP works.

But when I uncomment all and try to run all 3 lines works only last. Why? In another script I have a similar code and it changes whole while and whole lines works.

4
  • How does replacing 'MM' in 'MMPLA415' with 'KOLPOL' make it 'KOLPOLA415'? Commented Jun 28, 2019 at 6:17
  • Edited. MMPLA -> KOLPOLA Commented Jun 28, 2019 at 6:19
  • The reason your code does not work is because the last line overwrites the results from the first two lines. Can you please explain whether you're trying to replace all strings beginning with MM upto the last char, or specifically MMPL, or what is it? Commented Jun 28, 2019 at 6:20
  • Still wrong. Replacing RWP with KOLPOL in RWPLB253 makes it KOLPOLLB253, not KOLPOLB253 Commented Jun 28, 2019 at 6:21

3 Answers 3

1

You can use a single replace call with regex=True:

df['Final'] = df['AlmostFinal'].replace(
    [r'KOPL', r'RWP.*?(?=A|B)', r'MM.*(?=A|B)'], 'KOLPOL', regex=True)
df

   Sample AlmostFinal       Final
0       1    KOPLA234  KOLPOLA234
1       1    KOPLA234  KOLPOLA234
2       2    RWPLB253  KOLPOLB253
3       3    MMPLA415  KOLPOLA415
4       3    MMPLA415  KOLPOLA415

We want to be able to handle varying number of characters between the substrings and the last character, so regex with lookahead will be useful here.


Further generalisation is possible. Just define your substrings, then insert a lookahead via list comp.

pat = ['KOPL', 'RWP', 'MM']
df['Final'] = df['AlmostFinal'].replace(
    [rf'{p}.*(?=A|B)' for p in pat], 'KOLPOL', regex=True)  # need python3.6+
df

   Sample AlmostFinal       Final
0       1    KOPLA234  KOLPOLA234
1       1    KOPLA234  KOLPOLA234
2       2    RWPLB253  KOLPOLB253
3       3    MMPLA415  KOLPOLA415
4       3    MMPLA415  KOLPOLA415

If you want to replace specific substrings, the solution is a little more simple.

pat = ['KOPL', 'RWPL', 'MMPL']
df['AlmostFinal'].replace(pat, 'KOLPOL', regex=True)

0    KOLPOLA234
1    KOLPOLA234
2    KOLPOLB253
3    KOLPOLA415
4    KOLPOLA415
Name: AlmostFinal, dtype: object

No other modifications required. For more general replacements, see above.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank U very much for examples and explanation. That's very useful! :)
1

And: If i comment 2th and 3th line replaces for KOPL works. When I comment 1st and 3th replace for RWP works. But when I uncomment all and try to run all 3 lines works only last. Why?

Because replace creates a new dataframe, and since you're always doing the replacement on the one original dataframe, each replace throws away the result of the previous one.

Either do all replacements simultaneously e.g. use a regex or I guess a single dict with multiple values (not sure why you'd use a dict for a single value here really:

{
    'KOPL':'KOLPOL',
    'RWP':'KOLPOL',
    'MMP':'KOLPOL',
}

or perform each replace on the result of the previous one (either chain replace, or the second and third should work on df['Final']).

2 Comments

Does not work for the same reason as mentioned here. It is not guaranteed what follows the substrings listed.
@cs95 There is an inconsistency in the OP. The description of the operation does not match the expected results.
1

You should execute one assignment, not three. Otherwise, each next assignment overwrites the results of the previous assignment.

dfF['Final'] = dfF['AlmostFinal']\
               .replace({'KOP|RWP|MMP': 'KOLPO'}, regex = True)

2 Comments

@cs95 It does produce the expected output after the OPs edits.
It's confusing, but I guess we'll have to wait for them to say :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.