0

If I want to replace a pattern in the following statement structure:

cat&345;
bat &#hut;

I want to replace elements starting from & and ending before (not including ;). What is the best way to do so?

3
  • are both of those one string Commented Jun 28, 2013 at 2:07
  • No, 2 separate strings Commented Jun 28, 2013 at 2:08
  • Good question, but please show also what you have been trying :) Commented Jun 28, 2013 at 2:09

4 Answers 4

1

Including or not including the & in the replacement?

>>> re.sub(r'&.*?(?=;)','REPL','cat&345;')           # including
'catREPL;'
>>> re.sub(r'(?<=&).*?(?=;)','REPL','bat &#hut;')    # not including
'bat &REPL;'

Explanation:

  • Although not required here, use a r'raw string' to prevent having to escape backslashes which often occur in regular expressions.
  • .*? is a "non-greedy" match of anything, which makes the match stop at the first semicolon.
  • (?=;) the match must be followed by a semicolon, but it is not included in the match.
  • (?<=&) the match must be preceded by an ampersand, but it is not included in the match.
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a good regex
import re
result = re.sub("(?<=\\&).*(?=;)", replacementstr, searchText)

Basically this will put the replacement in between the & and the ;

Comments

0

Maybe go a different direction all together and use HTMLParser.unescape(). The unescape() method is undocumented, but it doesn't appear to be "internal" because it doesn't have a leading underscore.

Comments

0

You can use negated character classes to do this:

import re

st='''\
cat&345;
bat &#hut;'''

for line in st.splitlines():
    print line
    print re.sub(r'([^&]*)&[^;]*;',r'\1;',line)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.