4

I would like to search in a text of multiple lines to see if each line has sentence=" followed by some text and ended with " />'. If yes, see if the text between sentence=" and " />' has ", if yes, replace it with '. For example, one such line is:

<number="4" word="start" sentence="I said, "start!"" />

I would like to change it to be

<number="4" word="start" sentence="I said, 'start!'" />

Note that such cases can happen more than once in each single line of the text.

I wonder how to use regex in Python to accomplish that? Thanks!

2
  • I assume your example's sentence attribute is supposed to be instance? (or vice versa) Commented Mar 18, 2014 at 0:58
  • @roippi: oops. "instance" should be "sentence" Commented Mar 18, 2014 at 0:59

1 Answer 1

3

You can provide a callable to re.sub to tell it what to replace the match object with:

s = """<number="4" word="start" sentence="I said, "start!"" />"""

re.sub(r'(?<=sentence=")(.*)(?=" />)', lambda m: m.group().replace('"',"'"), s)
Out[179]: '<number="4" word="start" sentence="I said, \'start!\'" />'
Sign up to request clarification or add additional context in comments.

1 Comment

+1 I coulnt get the last " right, with re.sub(r'sentence="(.*)"\s*/>', lambda x: x.group(0).replace('"', "'"), data) :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.