4

I have a file which has special characters, so I used file operations to read.

f=open('st.txt','r')
string=f.read()

The sample string is

"Free Quote!\n          \n          Protecting your family is the best investment you\'ll eve=\nr \n" 

now I want to remove all the special characters and get only the words from the string. so that my string will be:

"Free Quote Protecting your family is the best investment you'll ever"

2 Answers 2

4

Probably the simplest way to do this is a simple loop testing against string.ascii_letters plus a specific subset of extra characters (e.g., '-):

>>> import string
>>> text = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
>>> ''.join([x for x in text if x in string.ascii_letters + '\'- '])
"Free Quote  Protecting your family is the best investment you'll ever "

As you edit longer and more complex texts, excluding specific punctuation marks becomes less sustainable, and you'd need to use more complex regex (for example, when is a ' an apostrophe or a quote?), but for the scope of your problem above, this should suffice.

Sign up to request clarification or add additional context in comments.

1 Comment

Glad to hear that. If this answer worked for you, please accept it so that people know that the question is closed.
1

I found 3 solutions but there all close but not exactly what you want.

import re
in_string = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"

#variant 1
#Free Quote Protecting your family is the best investment youll eve r 
out_string = ""
array = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n".split( )
for word in array:
    out_string += re.sub(r'[\W]', '', word) + " "
print(out_string)

#variant 2
#Free Quote Protecting your family is the best investment you ll eve r
print(" ".join(re.findall("[a-zA-Z]+", in_string)))

#variant 3
#FreeQuoteProtectingyourfamilyisthebestinvestmentyoullever
print(re.sub(r'[\W]', '', in_string))

2 Comments

If you see the output, the special characters like ! and = are not removed and \n which gave the next line. I want the output to be as below "Free Quote Protecting your family is the best investment youll ever"
I update the question but unfortunately the solutions are just close

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.