Fixing open bracket problem in HTML string using Python

Question

I have the following text:

string = "<i>R</i> subspace  <i>{V.</i> generated by <i>{v<sub>1</sub>,...,v<sub>i</sub></i>, "

A careful reader might notice that there are two brackets missing. I was wondering, how this could be fixed using Python?

The expected output would be:

the <i>R</i>  subspace  <i>{V.}</i> generated by <i>{v<sub>1</sub>,...,v<sub>i</sub>}</i>,

One could:

Check: Is there a bracket after  ?
If yes -> Is there a bracket before  ?

How can I code this?

Edit

I have found this code, that can tell you if the brackets match or not.

What goes inside the brackets? Can "" appear inside brackets (i.e. does "{v" turn into "{v}" or "{v}")? — manveti
– manveti, Commented Apr 11, 2019 at 21:16
@CraigMeier Good point. The second case would be the correct one. — henry
– henry, Commented Apr 11, 2019 at 21:19
@CraigMeier Would you know how to already code the simpler case (first case) ? — henry
– henry, Commented Apr 11, 2019 at 21:27
The first case can be handled by a regex (e.g. re.sub with a callable repl param that ensures there's a final '}' for any chunk with an initial '{'). But the second case falls under stackoverflow.com/a/1732454 — manveti
– manveti, Commented Apr 11, 2019 at 21:34
@CraigMeier Thanks for your input, but I don't know how to code it with repl. Sorry to bother you... I see that I can do: re.sub(r', SOMETHING, string) — henry
– henry, Commented Apr 11, 2019 at 21:45

ggorlen · Accepted Answer · 2019-04-12 01:41:38Z

2

How about the following regex solution:

import re

string = "<i>R</i> subspace  <i>{V.</i> generated by <i>{v<sub>1</sub>,...,v<sub>i</sub></i>, "
expected = "<i>R</i> subspace  <i>{V.}</i> generated by <i>{v<sub>1</sub>,...,v<sub>i</sub>}</i>, "

fixed = re.sub(r"<(?P<tag>.*?)>({.*?)</(?P=tag)>", r"<\1>\2}</\1>", string)

print(fixed == expected) # True

The idea is to match a tag followed by a brace, find its closing tag, and plop the companion brace before the closing tag using the capture groups as <\1>\2}</\1>. Breakdown:

< # literal opening bracket
 (?P<tag> # open a named capture group
         .*? # lazily match any characters
            ) # end named capture group
             > # literal closing bracket
              ( # open capture group 2
               { # literal opening brace
                .*? # lazily match any characters
                   ) # end capture group 2
                    < # literal opening bracket
                     / # literal slash
                      (?P=tag) # backreference to the named group
                              > # literal closing bracket

If you just want , you can use re.sub(r"({.*?)", r"\1}", string).

answered Apr 12, 2019 at 1:41

ggorlen

59.3k8 gold badges119 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

henry Over a year ago

Thank you so much for this perfect answer !!

Collectives™ on Stack Overflow

Fixing open bracket problem in HTML string using Python

Edit

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Edit

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related