2

I have a string like this:

string ='ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'

I want to take out =E2=82=AC and =20

But when I use,

pattern ='(=\w\w)+'
a=re.split(pattern,string)

it returns

['ArcelorMittal invests ', '=AC', '87m in new process that cuts emissions', '=20', '']

2 Answers 2

1

You may use re.findall

>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> re.findall(r'(?:=\w{2})+', s)
['=E2=82=AC', '=20']
>>> 

Use re.sub if you want to remove those chars.

>>> re.sub(r'(?:=\w{2})+', '', s)
'ArcelorMittal invests 87m in new process that cuts emissions'
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot for help, but I need to decode it then put them back later so I want to use an order array so I can join them after decode
I think you need to use re.sub
thanks a lot avinash, I was extracting this text from gmail and they have some encoding problem , so I was gonna convert =E2=82=AC to /xE2/x82/xAC, then encode it to utf-8, then put back to the original string. Is there any way to work on it? thanks a lot
just replace = with /x . And also you may use re.split with your original but you have to turn capturing group to non-capturing group pattern =r'(?:=\w\w)+', so that it won't capture the delimiters.
1

Based on your comment I would recommend you to use quopri.decodestring on original string. There is no need to extract these characters and decode them separately

>>> import quopri
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> quopri.decodestring(s)
'ArcelorMittal invests \xe2\x82\xac87m in new process that cuts emissions '
>>> print quopri.decodestring(s)
ArcelorMittal invests €87m in new process that cuts emissions

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.