1
def Clean_Data(df):
   df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' ', '\w\s+\w|\w\n\w': '\w\s\w'}, regex=True, inplace=True)
   return df

I would like to clean my dataframe before I work on it. I need to get rid of:

double whitespace

whitespace + linebreak

-> and replace it with a single whitespace.

As well I want to check if there is more than one whitespace between two words (letters or numbers) and reduce it to a single whitespace.

And at least Check if there ae whitespaces between words and signs (, or .) and replace with ''.

But I have literally no idea of regex and getting already an error for bad escape \w

0

1 Answer 1

4

Try this df.replace({' +':' ', '\n':' ','->':' '}, regex=True, inplace=True)

First one checks for more than one whitespaces and replaces with only one whitespace.
Second one checks for new line and replaces with white space
Third is the pattern -> and replaces with white space

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.