Why doesn't str.replace replace ALL values in selected pandas dataframe column?

Question

I'm working on a huge file that has names in columns that contain extraneous values (like the "|" key) that I want to remove, but for some reason my str.replace function only seems to apply to some rows in the column.

My column in the dataframe summary looks something like this:

Labels
test|test 1
test 2
test 3
test|test 4
test|test 5
test 6

As you can see, some columns are already how i want them to be, only containing the name "test #", but some have "test|" in front, which I want removed.

My function to remove them is like this:

correction = summary["Labels"].str.replace('test\|', '')

It seems to work for most of the values, but when I check for pipes ("|") in the dataframe (once i merged correction with summary), it says it finds 9330 of them:

found = summary[summary['Labels'].str.contains('|',regex=False)]
print(len(found))
print(found['Labels'].value_counts())

Results
9330
test|test-667     59
test|test-765     40
test|test-1810    39
test|test-685     36
test|test-1077    33
                  ..

Does anyone know why this is, and how i can fix it?

Any chance there could something like be testtest||test-667? — mozway
– mozway, Commented Jan 12, 2022 at 21:06
In the function you wrote, correction is a series. But when you are looking for errors, correction is a dataframe. So you are actually not showing us what you really did... — Aryerez
– Aryerez, Commented Jan 12, 2022 at 21:11
@Aryerez ah you're right sorry, forgot to add that i put correction into the summary dataframe after removing the unwanted values. i've corrected the code above to reflect that! — Emily
– Emily, Commented Jan 12, 2022 at 21:28
@Emily It is possible that your problem comes from combining correction and summary the wrong way, which we can't know since you are not showing us. — Aryerez
– Aryerez, Commented Jan 12, 2022 at 21:33

wwnde · Accepted Answer · 2022-01-12 22:47:13Z

1

You were on the right track. Replace raw string as follows

summary['Labels'] = summary['Labels'].str.replace(r'test\|','', regex=True)



Labels
0  test 1
1  test 2
2  test 4

answered Jan 12, 2022 at 22:47

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Corralien · Accepted Answer · 2022-01-12 21:17:56Z

1

Try str.extract:

df['Labels'] = df['Labels'].str.extract(r'\|(.*)', expand=False) \
                           .combine_first(df['Labels'])
print(df)

# Output
   Labels
0  test 1
1  test 2
2  test 3
3  test 4
4  test 5
5  test 6

answered Jan 12, 2022 at 21:17

Corralien

121k8 gold badges43 silver badges68 bronze badges

1 Comment

Emily Over a year ago

Thanks for your response! I tried this and it still doesn't seem to work, not sure why

Collectives™ on Stack Overflow

Why doesn't str.replace replace ALL values in selected pandas dataframe column?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related