15

In perl s/[^\w:]//g would replace all non alphanumeric characters EXCEPT :

In python I'm using re.sub(r'\W+', '',mystring) which does remove all non alphanumeric except _ underscore. Is there any way to put exceptions, I wish not to replace signs like = and .

Previously I was applying the other approach i.e. to replace all unwanted characters usingre.sub('[!@#\'\"$()]', '',mystring`) However, it is not possible for me to predict what all characters may come in mystring hence I wish to remove all non alphanumeric characters except a few.

Google didnt provide an appropriate answer. The closest search being python regex split any \W+ with some exceptions but this didnt help me either.

3 Answers 3

16

You can specify everything that you need not remove in the negated character clas.

re.sub(r'[^\w'+removelist+']', '',mystring)

Test

>>> import re
>>> removelist = "=."
>>> mystring = "asdf1234=.!@#$"
>>> re.sub(r'[^\w'+removelist+']', '',mystring)
'asdf1234=.'

Here the removelist variable is a string which contains the list of all characters you need to exclude from the removal.

What does negated character class means

When the ^ is moved into the character class it does not acts as an anchor where as it negates the character class.

That is ^ in inside a character class say like [^abc] it negates the meaning of the character class.

For example [abc] will match a b or c where as [^abc] will not match a b or c. Which can also be phrased as anything other than a b or c

Sign up to request clarification or add additional context in comments.

4 Comments

thanks @nu11p01n73R . I was not adding remove list inside, [] . I gave something like '^w.=' which off course was not working. Could you please tell the meaning of r and ^, ^ is usually used as "start with" but here it seems to be having a different meaning.
@user1977867 Yep when ^ in inside a character class say like [^abc] it negates the meaning of the character class. That is [abc] will match a b or c where as [^abc] will not match a b or c. That is anything other than a b or c
Can I ask why you're calling removelist 'remove'list? It seems to me that it's a list of chars that you'd like to keep. I'm only mentioning it because it had me confused.
@ikku100 Ohh, I have mentioned it in the answer Here the removelist variable is a string which contains the list of all characters you need to exclude from the removal.
10
re.sub(r'[^a-zA-Z0-9=]', '',mystring)

You can add whatever you want like _ whichever you want to save.

Comments

7

I believe the approach you describe in perl could also be used in python, eg:

re.sub(r'[^\w=]', '',mystring)

would remove everything except word-characters and =

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.