1

I have juste a little experience with the regex, and now I have a little problem.

I must retrieve the strings between the .

So here is a sample :

Categories: <a href="/car/2/page1.html">2</a>, <a href="/car/nissan/">nissan</a>,<a href="/car/all/page1.html">all</a>

And this is my little regex:

re.findall("""<a href=".*">.*</a>""",string)

Well, it works , but I just want the strings between the , not the href, so how could I do this ?

thanks.

2 Answers 2

2

Use parentheses to form a capturing group:

'<a href=".*">(.*)</a>'

You also probably want to use a non-greedy quantifier to avoid matching far more than you intended.

'<a href=".*?">(.*?)</a>'

Result:

['2', 'nissan', 'all']

Or even better, consider using an HTML parser, such as BeautifulSoup.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 to BeautifulSoup, you will not have to tackle utf-8 parsing and html-encoding.
1

Regex is never a good idea for parsing HTML. There are too many edge cases that make crafting a robust regular expression difficult. Consider the following perfectly browser-viewable links:

< a href="/car/all/page1.html">all</a>
<a  href="/car/all/page1.html">all</a>
<a href= "/car/all/page1.html">all</a>
<a id="foo" href="/car/all/page1.html">all</a>
<a
 href="/car/all/page1.html">all</a>

All of which will not be matched by the given regular expression. I highly recommend an HTML parser, such as Beautiful Soup or lxml. Here's an lxml example:

from lxml import etree

html = """
Categories: <a href="/car/2/page1.html">2</a>, <a href="/car/nissan/">nissan</a>,<a href="/car/all/page1.html">all</a>
"""
doc = etree.HTML(html)
result = doc.xpath('//a[@href]/text()')

Result:

['2', 'nissan', 'all']

no matter if the HTML is different or even somewhat malformed.

1 Comment

I've also seen <a> tags in the wild with only single quotes or even no quotes around the href value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.