0

Python keeps returning a string with a broken character.

python

test = re.sub('handle(.*?)', '<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.')
print test

what I want

<verse osisID="lol">a bunch of random text here.</verse>

what i am getting

<verse osisID="lol">*broken character*</verse>a bunch of random text here.
0

2 Answers 2

8

You should either escape the \ character or use a r'' raw string:

>>> re.sub('handle(.*?)', r'<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.')
'<verse osisID="lol"></verse> a bunch of random text here.'

Without the r'' raw string literal, backslashes are interpreted as escape codes. You can double the backslash as well:

>>> '\1'
'\x01'
>>> '\\1'
'\\1'
>>> r'\1'
'\\1'
>>> print r'\1'
\1

Note that you replace just the word handle there, the .*? pattern matches 0 characters at minimum. Remove the question mark and it'll match your expected output:

>>> re.sub('handle(.*)', r'<verse osisID="lol">\1</verse>', 'handle a bunch of random text here.')
'<verse osisID="lol"> a bunch of random text here.</verse>'
Sign up to request clarification or add additional context in comments.

3 Comments

you are a beautiful person :)
you might want to match the space after the handle but before the next word as well, as this will prevent the ...> a br... You could do this with handle *(.*) presuming you only have spaces (not other whitespace)
@AndrewCox: I'd use \s* to match whitespace there instead; why limit only to spaces?
0

below code tested under python 3.6

import re 

test = 'a bunch of random text here.'
resp = re.sub(r'(.*)',r'<verse osisID="lol">\1</verse>',test)
print (resp)

<verse osisID="lol">a bunch of random text here.</verse>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.