I'm working on a huge file that has names in columns that contain extraneous values (like the "|" key) that I want to remove, but for some reason my str.replace function only seems to apply to some rows in the column.
My column in the dataframe summary looks something like this:
Labels
test|test 1
test 2
test 3
test|test 4
test|test 5
test 6
As you can see, some columns are already how i want them to be, only containing the name "test #", but some have "test|" in front, which I want removed.
My function to remove them is like this:
correction = summary["Labels"].str.replace('test\|', '')
It seems to work for most of the values, but when I check for pipes ("|") in the dataframe (once i merged correction with summary), it says it finds 9330 of them:
found = summary[summary['Labels'].str.contains('|',regex=False)]
print(len(found))
print(found['Labels'].value_counts())
Results
9330
test|test-667 59
test|test-765 40
test|test-1810 39
test|test-685 36
test|test-1077 33
..
Does anyone know why this is, and how i can fix it?
testtest||test-667?correctionis aseries. But when you are looking for errors,correctionis adataframe. So you are actually not showing us what you really did...correctioninto thesummarydataframe after removing the unwanted values. i've corrected the code above to reflect that!correctionandsummarythe wrong way, which we can't know since you are not showing us.