Parsing content in html tags using regex

Question

I want to parse content from

<td>content</td>
and
<td *?*>content</td>
and 
<td *specific td class*>content</td>

How can i make this with regex, php and preg match?

I think maybe we've reached the stage where language specific sites for so are necessary... — ennuikiller
– ennuikiller, Commented Jan 4, 2010 at 18:15

Community · Accepted Answer · 2023-11-17 20:16:56Z

4

I think this sums it up pretty good.

In short, don't use regular expressions to parse HTML. Instead, look at the DOM classes and especially DOMDocument::loadHTML

edited Nov 17, 2023 at 20:16

CommunityBot

11 silver badge

answered Jan 4, 2010 at 18:21

Emil Vikström

92.3k17 gold badges144 silver badges178 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pascal MARTIN · Accepted Answer · 2010-01-04 18:16:03Z

3

If you have an HTML document, you really shouldn't use regular expressions to parse it : HTML is just not "regular" enough for that.

A far better solution would be to load your HTML document using a DOM parser -- for instance, DOMDocument::loadHTML and Xpath queries often do a really great job !

answered Jan 4, 2010 at 18:16

Pascal MARTIN

402k82 gold badges665 silver badges666 bronze badges

1 Comment

prodigitalson Over a year ago

seconded... regex is the hard way.

yu_sha · Accepted Answer · 2010-01-04 18:21:15Z

0

<td>content</td>: <td>([^<]*)</td>

<td *specific td class*>content</td>: <td[^>]*class=\"specific_class\"[^>]*>([^<]*)<

answered Jan 4, 2010 at 18:21

yu_sha

4,39025 silver badges19 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-01-05 00:06:36Z

0

@OP, here's one way

$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;

$s = explode("</td>",$str);
foreach ($s as $a=>$b){
    $b=preg_replace("/.*<td.*>/","",$b);
    print $b."\n";
}

output

$ php test.php
content

content

content

 multiline
content

answered Jan 5, 2010 at 0:06

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

Collectives™ on Stack Overflow

Parsing content in html tags using regex

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related