REGEX for HTML in php

Question

I have an html file... this file has the formula:

<body>
<p class="Title-P">Compiler</p>
<p class="Heading1-P">kdnkls:</p>
<p class="Normal-P">dsf</p>
<p class="ListParagraph-P">kjsksf</p>
<p class="ListParagraph-P">dsfsf</p>
<p class="ListParagraph-P">sfsfsf</p>
<p class="Heading2-P">fsfs:</p>
</body>

what is the suitable regex to replace the tags:

<p class="Title-P>foo</p> with <h1>foo</h1>

<p class="Heading1-P">kdnkls:</p> with <h2> kdnkls: </h2>
<p class="Normal-P>foo</p> with <p> foo </p>
etc...

I'm using preg_replace function in php which takes as arguments: pattern and replacement...

Welcome to Stack Overflow! Please refrain from parsing HTML with RegEx as it will drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢. Use an HTML parser instead. — Madara's Ghost
– Madara's Ghost, Commented Aug 11, 2012 at 0:47
lol @Truth !!! i'm sure you just copied-pasted the comment... — CdB
– CdB, Commented Aug 11, 2012 at 0:48

drew010 · Accepted Answer · 2012-08-11 00:50:17Z

3

Try:

$html = preg_replace('/<p class="Title-P">(.*?)<\/p>/i', "<h1>$1</h1>", $html);
$html = preg_replace('/<p class="Normal-P">(.*?)<\/p>/i', "<p>$1</h1>", $html);

That should work, better bet is to parse the document using DOM and make your changes and then save out the document.

answered Aug 11, 2012 at 0:50

drew010

70.3k11 gold badges144 silver badges174 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Madara's Ghost Over a year ago

You are aware I can break that regex in approx 0.8 seconds, right?

drew010 Over a year ago

I'm aware of many things. His HTML file has a specific format which that matches...

drew010 Over a year ago

@user1576848 DOM is a true HTML/XHTML parser and can parse the whole HTML document into an object and you can easily access certain tags and search nodes within the document. While regex can be used to match certain patterns that may be HTML, it isn't well suited for more advanced parsing of HTML because matching the correct closing tags can be difficult or overly complicated. For any serious manipulation or access to nodes within an (X)HTML document, DOM is the way to go in PHP.

drew010 Over a year ago

@user1576848 My view will differ from a lot of SO, as you can see from comments, people go nuts over seeing regex to do anything with HTML. I think regex is fine for certain HTML matching or replacing IF YOU UNDERSTAND that any minor change to the HTML format can render your regex matchless, creating overly complex regexps for HTML is bad practice, people will look at it later (even yourself) and you won't easily understand what the regex does and will spend a lot of time examining it (especially when it breaks). THAT SAID: regex can be faster (when written efficiently) than using DOM

drew010 Over a year ago

...since DOM has to parse the entire document structure into memory. With very large documents, DOM can use too much memory, and a well written regex can consume much less memory to parse certain content. So if you have a VERY SPECIFIC HTML format that can be easily matched with a simple to understand regex, I say fine go for it, it was made to match patterns. If you want to do something like "find all <a> tags that have an onclick attribute", or "parse all <b> tags within an <h1> tag" then I say go with the DOM for those types of cases. That's my 2 cents :)

|

Collectives™ on Stack Overflow

REGEX for HTML in php

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related