1

I have an html file... this file has the formula:

<body>
<p class="Title-P">Compiler</p>
<p class="Heading1-P">kdnkls:</p>
<p class="Normal-P">dsf</p>
<p class="ListParagraph-P">kjsksf</p>
<p class="ListParagraph-P">dsfsf</p>
<p class="ListParagraph-P">sfsfsf</p>
<p class="Heading2-P">fsfs:</p>
</body>

what is the suitable regex to replace the tags:

<p class="Title-P>foo</p> with <h1>foo</h1>

  • <p class="Heading1-P">kdnkls:</p> with <h2> kdnkls: </h2>
  • <p class="Normal-P>foo</p> with <p> foo </p>
  • etc...

I'm using preg_replace function in php which takes as arguments: pattern and replacement...

4

1 Answer 1

3

Try:

$html = preg_replace('/<p class="Title-P">(.*?)<\/p>/i', "<h1>$1</h1>", $html);
$html = preg_replace('/<p class="Normal-P">(.*?)<\/p>/i', "<p>$1</h1>", $html);

That should work, better bet is to parse the document using DOM and make your changes and then save out the document.

Sign up to request clarification or add additional context in comments.

6 Comments

You are aware I can break that regex in approx 0.8 seconds, right?
I'm aware of many things. His HTML file has a specific format which that matches...
@user1576848 DOM is a true HTML/XHTML parser and can parse the whole HTML document into an object and you can easily access certain tags and search nodes within the document. While regex can be used to match certain patterns that may be HTML, it isn't well suited for more advanced parsing of HTML because matching the correct closing tags can be difficult or overly complicated. For any serious manipulation or access to nodes within an (X)HTML document, DOM is the way to go in PHP.
@user1576848 My view will differ from a lot of SO, as you can see from comments, people go nuts over seeing regex to do anything with HTML. I think regex is fine for certain HTML matching or replacing IF YOU UNDERSTAND that any minor change to the HTML format can render your regex matchless, creating overly complex regexps for HTML is bad practice, people will look at it later (even yourself) and you won't easily understand what the regex does and will spend a lot of time examining it (especially when it breaks). THAT SAID: regex can be faster (when written efficiently) than using DOM
...since DOM has to parse the entire document structure into memory. With very large documents, DOM can use too much memory, and a well written regex can consume much less memory to parse certain content. So if you have a VERY SPECIFIC HTML format that can be easily matched with a simple to understand regex, I say fine go for it, it was made to match patterns. If you want to do something like "find all <a> tags that have an onclick attribute", or "parse all <b> tags within an <h1> tag" then I say go with the DOM for those types of cases. That's my 2 cents :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.