0

If I have a piece of text, i.e.

title="gun control" href="/EBchecked/topic/683775/gun-control"

and want to create a regular expression that matches (see inside <> below)

title="<1 word or many words separated by a space>" href="/EBchecked/topic/\w*/\S*"

How do I solve that part in the <>?

4
  • 3
    It looks like you are trying to parse HTML; why not use a proper HTML parser instead? BeautifulSoup makes HTML handling a breeze: for link in soup.find_all('a', href=True): print link.attrs.get('title', 'No title set'), link['href']. Commented May 9, 2013 at 18:14
  • that is a way better idea than me trying to write bad regexes... good suggestion - I will look into it. Thanks! Commented May 9, 2013 at 19:00
  • Martijn - could you help me get started using beautiful soup? if I wanted to get all the hyperlinks from <div class="md-content-wrapper resizable-content topic-content "> of the url britannica.com/EBchecked/topic/596738/Tipperary - How would I do that exactly? Thanks again for your previous input and 2x thanks in advance if you get around to answering this question! Commented May 13, 2013 at 21:53
  • 1
    You could make that a proper question; I'd use content = soup.find('div', 'topic-content'), then for link in content.find_all('a', href=True): print link.attrs.get('title', 'No title set'), link['href']. Commented May 13, 2013 at 21:55

1 Answer 1

2

The following regex will match 1 word or many words separated by a space:

\w+( \w+)*

Here a "word" is considered to consist of letters, digits, and underscores. If you only want to allow letters you could use [a-zA-Z]+( [a-zA-Z]+)*.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Works great. Appreciate the help.
@brett No problem, if my answer works you can accept it by clicking the outline of the check mark next to the answer. This lets others on the site know that you found a solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.