0

I'm very new to PHP writing and regular expressions. I need to write a Regex pattern that will allow me to "grab" the headlines in the following html tags:

<title>My news</title>
<h1>News</h1>

<h2 class=\"yiv1801001177first\">This is my first headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is another headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is the third headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is the last headline</h2>
<p>This is a summary of a fascinating article.</p>

So I need a pattern to match all the <h2> tags. This is my first attempt at writing a pattern, and I'm seriously struggling...
/(<h+[2])>(.*?)\<\/h2>/ is what I've attempted. Help is much appreciated!

2
  • Maybe you can checkout the following similar question: stackoverflow.com/questions/1732348/… which illustrates how to write regex expressions for parsing HTML. Commented Jun 11, 2011 at 8:32
  • 1
    Welcome to StackOverflow. A little hint: If you want to post code and/or HTML, paste it as-is into the edit box, then highlight it and press Ctrl-K. That way you don't have to mess around with HTML entities and escapes yourself, and it makes the code easier to read for us. Commented Jun 11, 2011 at 8:54

3 Answers 3

1

I'm not too familiar with PHP, but in cases like this it's usually easier to use XML parser (which will automatically detect <h2> as well as <h2 class="whatever"> rather than regex, which you'll have to add a bunch of special cases to. Javascript, for example has XML DOM exactly for this purpose, I'd be surprised if PHP didn't have something similar.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, I realize there are probably better ways to go about doing this, but this is for an assignment, and it specifically asked me to write a PHP script using Regex and then outputting it as an unordered list...
@Jo W, gently explain the person that imposed such requirement to this assignment to consult a doctor. Then use a HTML parser to solve the assignment. If you want to learn Regex, applying it to HTML would be the worst example of doing so.
1

The easiest way to do it via regex is

#<h2\b[^>]*>(.*?)</h2>#is

This will match any h2 tag and capture its contents in backreference $1. I've used # as a regex delimiter to avoid escaping the / later on in the regex, and the is options to make the regex case-insensitive and to allow newlines within the tag's contents.

There are circumstances where this regex will fail, though, as pointed out correctly by others in this thread.

Comments

0

I have only checked in RegexBuddy, there following regex works:

<h2.*</h2>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.