1

I have a string in python. I used escape() to get rid of the newlines, now my string looks like this:

<p>Wie hoch ist der Anteil «oraler MS-Medikamente»
bei Neuverschreibungen in Ihrer Sprechstunde? </p>

But its supposed to look like this

Wie hoch ist der Anteil oraler MS-Medikamente bei Neuverschreibungen in Ihrer Sprechstunde?

What can I do?

3 Answers 3

1
  1. Try to decode (reverse escape).
    HTML Encoder / Decoder - Converts characters to their corresponding HTML Entities - Web 2.0 Generators http://www.web2generators.com/html/entities

  2. You could use also this hint

import BeautifulSoup

soup= BeautifulSoup(raw_html)
cleantext = soup.text
Sign up to request clarification or add additional context in comments.

Comments

0

You can unescape the string in order to get HTML tags back:

import HTMLParser
parser = HTMLParser.HTMLParser()
str = parser.unescape(str)

and then use some regex to remove HTML tags:

p = re.compile(r'<.*?>')
return p.sub('', str)

I don't really recommend using regexes for parsing HTML, you can use BeautifulSoup instead.

Comments

0

List all unnecessary symbols in the characters list and then replace them:

string = '&lt;p&gt;Wie hoch ist der Anteil &amp;laquo;oraler MS-Medikamente&amp;raquo;bei Neuverschreibungen in Ihrer Sprechstunde?&amp;nbsp;&lt;/p&gt;'

def unescape(s):
    characters = ["&lt;p&gt;", "&lt;", "&gt;", "&amp;", "laquo;", "raquo;", "nbsp;", "/p"]
    for character in characters:
        s = s.replace(character, "")
    return s

print(unescape(string))

Here is the result:

Wie hoch ist der Anteil oraler MS-Medikamentebei Neuverschreibungen in Ihrer Sprechstunde?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.