1

So, I'm wanting to make a list in Python from a large chunk of HTML code, but I'm trying to split it up based on the HTML tags. I'm not well versed in regular expressions so I don't know how to go about this. For instance, let's say I had this piece of HTML code:

<option value="674"> Example text here </option><option value="673"> Example text here</option><option value="672"> Example text here </option>

I would like for me to be able to save this code (albeit a much bigger version of it) into a string, and then use a function to return a list like this:

list = ["Example text here", "Example text here", "Example text here"]

Anyway I can do this?

2
  • 1
    Stop now and use an HTML parser. Please. Commented May 2, 2014 at 2:23
  • Thank you, I was unaware of this library. Commented May 2, 2014 at 2:29

2 Answers 2

1

You could simply use BeautifulSoup for this purpose.

import bs4

html = '''
<option value="674"> Example text here </option>
<option value="673"> Example text here</option>
<option value="672"> Example text here </option>
'''

soup  = bs4.BeautifulSoup(html)
mylst = [str(x.text).strip() for x in soup.find_all('option')]

Output

['Example text here', 'Example text here', 'Example text here']
Sign up to request clarification or add additional context in comments.

Comments

1

I agree with @roippi's comment, please use HTML parser. However, if you really want to use regex, the following is what you want:

import re

s = '<option value="674"> Example text here </option><option value="673"> Example text here</option><option value="672"> Example text here </option>'

>>> print re.findall(r'>\s*([^<]+?)\s*<', s)
['Example text here', 'Example text here', 'Example text here']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.