0

I want to parse content from

<td>content</td>
and
<td *?*>content</td>
and 
<td *specific td class*>content</td>

How can i make this with regex, php and preg match?

2
  • I think maybe we've reached the stage where language specific sites for so are necessary... Commented Jan 4, 2010 at 18:15
  • 1
    Duplicate. stackoverflow.com/questions/1732348/… Commented Jan 4, 2010 at 18:38

4 Answers 4

4

I think this sums it up pretty good.

In short, don't use regular expressions to parse HTML. Instead, look at the DOM classes and especially DOMDocument::loadHTML

Sign up to request clarification or add additional context in comments.

Comments

3

If you have an HTML document, you really shouldn't use regular expressions to parse it : HTML is just not "regular" enough for that.

A far better solution would be to load your HTML document using a DOM parser -- for instance, DOMDocument::loadHTML and Xpath queries often do a really great job !

1 Comment

seconded... regex is the hard way.
0

<td>content</td>: <td>([^<]*)</td>

<td *specific td class*>content</td>: <td[^>]*class=\"specific_class\"[^>]*>([^<]*)<

Comments

0

@OP, here's one way

$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;

$s = explode("</td>",$str);
foreach ($s as $a=>$b){
    $b=preg_replace("/.*<td.*>/","",$b);
    print $b."\n";
}

output

$ php test.php
content

content

content

 multiline
content

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.