2

I'm working with Python 3.5 in Windows. I have a dataframe where a 'titles' str type column contains titles of headlines, some of which have special characters such as â,,˜.

I am trying to replace these with a space '' using pandas.replace. I have tried various iterations and nothing works. I am able to replace regular characters, but these special characters just don't seem to work.

The code runs without error, but the replacement simply does not occur, and instead the original title is returned. Below is what I have tried already. Any advice would be much appreciated.

df['clean_title'] = df['titles'].replace('€','',regex=True)
df['clean_titles'] = df['titles'].replace('€','')
df['clean_titles'] = df['titles'].str.replace('€','')

def clean_text(row):
   return re.sub('€','',str(row))
   return str(row).replace('€','')
df['clean_title'] = df['titles'].apply(clean_text)
6
  • 3
    I can't reproduce, your third example works for me. Can you post a sample of your dataframe? Commented Jun 13, 2018 at 21:46
  • let's say on title is '‘BetterHash’: Bitcoin Core Dev. Proposes New Protocols to Decentralize Bitcoin Mining' Commented Jun 13, 2018 at 21:54
  • 2
    This looks more like an encoding error. Therefore you better solve this at the encoding level. Commented Jun 13, 2018 at 21:57
  • I have tried various encodings such as utf-8 while importing the csv file using read_csv. Nothing works :( Commented Jun 13, 2018 at 22:05
  • Try .replace('\xE2\x82\xAC', '') Commented Jun 14, 2018 at 7:02

2 Answers 2

4

We can only assume that you refer to non-ASCI as 'special' characters.

To remove all non-ASCI characters in a pandas dataframe column, do the following:

df['clean_titles'] = df['titles'].str.replace(r'[^\x00-\x7f]', '')

Note that this is a scalable solution as it works for any non-ASCI char.

Sign up to request clarification or add additional context in comments.

Comments

0

How to remove escape sequence character in dataframe

Data.

product,rating pest,<br> test mouse,/
mousetest

Solution: scala Code

 val finaldf = df.withColumn("rating", regexp_replace(col("rating"), "\\\\", "/")).show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.