0

Given the following HTML content (limited to the absolute minimum I require):

enter image description here

How would I be able to extract Page Title using Regex?

5
  • Are you only grabbing titles or are you going to be parsing out more from the document? If so, use an HTML parser. Commented Sep 10, 2012 at 16:00
  • 1
    You may look at this answer Commented Sep 10, 2012 at 16:02
  • Wow :O Happened to have missed that. So should I use an HTML parser, and if so, which one? Commented Sep 10, 2012 at 16:07
  • It depends on what language you want to use. The main reason for an HTML parser is the malformed nature of HTML/XML. Commented Sep 10, 2012 at 16:09
  • 1
    The language is C# (if that's what you mean). I still feel that an HTML parser is overkill in my situation. What if we assume that the pattern is always exactly this way, can't I better use regex? Commented Sep 10, 2012 at 16:15

1 Answer 1

1

As others have commented, regular expressions may not be suitable for a bullet-proof method. E.g. using regex, it would be difficult to check if the <title> tag were part of a quoted string within the HTML. That's a recurring response on StackOverflow for questions like this. But personally, I think you've got a point that a parser would be overkill for such a simple extraction. If you're looking for a method that works most of the time, one of the following should surfice.

Option 1: Lookbehind / lookahead

(?<=<title[\s\n]*>[\s\n]*)(.(?![\s\n]*</title[\s\n]*>))*

This uses lookbehind and lookahead for the tags - .NET has a sophisticated regex engine that allows for infinite repetition so you can even check for whitespace/return characters between the tag name and end brace (see this answer).

Option 2: Capturing group

<title[\s\n]*>[\s\n]*(.*)[\s\n]*</title[\s\n]*>

Similar but slightly simpler - the whole regex match includes the start and end tags. The first (and only) capturing group (.*) captures the bit that is of interest in between.

Visualisation: Regular expression visualization

Edit live on Debuggex

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.