2

I have a pandas dataframe df with ids as strings: I am trying to create the new_claim and new_description columns

example df

Closest SO I found was Efficiently replace part of value from one column with value from another column in pandas using regex? but this uses split part, and since the description changes I was unable to generalize.

I can run a one off

date_reg = re.compile(r'\b'+df['old_id'][1]+r'\b')

df['new_claim'] = df['claim'].replace(to_replace=date_reg, value=df['external_id'], inplace=False)

But if I have

date_reg = re.compile(r'\b'+df['claim']+r'\b')

Then I get "TypeError: 'Series' objects are mutable, thus they cannot be hashed"

Another approach I took

df['new_claim'] = df['claim']

for i in range(5):
    old_id = df['old_id'][i]
    new_id = df['external_id'][i]

    df['new_claim'][i] = df['claim'][i].replace(to_replace=old_id,value=new_id)

which givesa TypeError: replace() takes no keyword arguments

1 Answer 1

2

Using just the method pandas.replace():

df.old_id = df.old_id.fillna(0).astype('int')

list_old = list(map(str, df.old_id.tolist()))
list_new = list(map(str, df.external_id.tolist()))

df['new_claim'] = df.claim.replace(to_replace=['Claim ID: ' + e for e in list_old], value=['Claim ID: ' + e for e in list_new], regex=True)
df['new_description'] = df.description.replace(to_replace=['\* ' + e + '\\n' for e in list_old], value=['* ' + e + '\\n' for e in list_new], regex=True)

Produces the following output:

enter image description here

Sign up to request clarification or add additional context in comments.

10 Comments

Thank you, I got an unexpected output when old_id = 2838976 and external_id = 102372949 the new claim becomes Claim ID: 102371023948109, in most cases its working as expected
@MattS, the problem is that a smaller pattern is match in a bigger ID. For instance, a value 11 is found in the 1111 id.
If you could post a real case test, we can try to find another pattern to filter, like using the semicolon as part of the value.
Hi @MattS, I have improved the pattern to match. The problem was the empty old_id. Please accept and upvote my answer if it is everything correct! :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.