0

I want to decode a some hex in python.

In part of the string \xcd\xed\xb0\xb2

    text = re.search(r'(\\x\w{2}){4}', rtf)

    unicodeText = text.decode('gb2312')

Error: '_sre.SRE_Match' object has no attribute 'decode'

Hope someone can help, Thanks

1
  • Why do you need to use regex? Can't you decode the whole string? Commented Sep 15, 2014 at 13:12

1 Answer 1

1

re.search returns a Match object, not a matched string.

Use group method to get the matched string.

>>> rtf = r'\xcd\xed\xb0\xb2'
>>> matched = re.search(r'(\\x\w{2}){4}', rtf)
>>> text = matched.group()
>>> text.decode('string-escape').decode('gb2312')
u'\u665a\u5b89'

# In Python 3.x
# >>> text.encode().decode('unicode-escape').encode('latin1').decode('gb2312')
# '晚安'

BTW, you don't need to use regular expression, what you want is convert \xOO:

Python 2.x:

>>> rtf = r'\xcd\xed\xb0\xb2'
>>> rtf.decode('string-escape').decode('gb2312')
u'\u665a\u5b89'
>>> print rtf.decode('string-escape').decode('gb2312')
晚安

Python 3.x:

>>> rtf = r'\xcd\xed\xb0\xb2'
>>> rtf.encode().decode('unicode-escape').encode('latin1').decode('gb2312')
'晚安'
Sign up to request clarification or add additional context in comments.

8 Comments

That returns "'str' object has no attribute 'decode'"
@JamesGarner, I just updated the answer. If you use Python 3.x, try the code in the comment.
Perfect, thanks! I will mark correct as soon as I can
@JamesGarner, I updated the answer again. In short, you don't need to use regular expression.
The reason why I use the regex is because I have mulitple languages, should it not matter?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.