0

I have a number of links that look like:

<a href="http://url.com/?foo=bar&p=20" title="foo">Foo</a>
<a href="http://url2.com/?foo=bar&p=30" title="foo">Foo</a>

I'm trying to extract the parameter p from each href found. So in this case I have an end result array as array (20, 30).

What would be a good regex for this? Thanks.

6
  • Like that? So invalid HTML? There isn't a good regex for it, use an HTML parser (one with good error recovery), extract the values of the href attributes, then run them through a URL parser. Commented Sep 29, 2010 at 17:30
  • Does href="[^ "]+\&p=([^"]+)" work? Commented Sep 29, 2010 at 17:37
  • You can't parse with regular expressions reliably. See below. Commented Sep 29, 2010 at 17:43
  • @FrustratedWithFormsDesigner: Yeah that's actually pretty close! Thanks Commented Sep 29, 2010 at 17:44
  • 1
    Suggested third party alternatives to SimpleHtmlDom that actually use DOM instead of String Parsing: phpQuery, Zend_Dom, QueryPath and FluentDom. Commented Sep 29, 2010 at 17:48

1 Answer 1

8

Don’t try to parse HTML with regular expressions; use an HTML parser like PHP’s DOM library or the PHP Simple HTML DOM Parser instead. Then parse the URL with parse_url and the query string with parse_str.

Here’s an example:

$html = str_get_html('…');
$p = array();
foreach ($html->find('a[href]') as $a) {
    parse_str(parse_url($a->getAttribute('href'), PHP_URL_QUERY), $args);
    if (isset($args['p'])) $p[] = $args['p'];
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.