2

I'm trying to match <html> tag with optional attributes and to extract those attributes. I want to match one of the following variations of <html> tag. It would be the starting content of a HTML document or there may be DOCTYPE declaration before <html>.

<html>
<html lang="en">
<html class="my-class">
<html class="my-class" lang="en">

The regular expression pattern I'm trying is as below, but it is only matching the last attribute lang="en" for the fourth case.

/<html(\s+([a-z\-]+)=('|")([^"'>]*)('|"))*>/i

Demo

I know that some suggest to use DOM parser instead of regular expression. But I think regular expression is enough for my case as I want to match <html> tag only.

3
  • is <html always appears at the first? Commented Jan 17, 2015 at 4:56
  • @AvinashRaj No problem of it. We can add ^ at the start of the pattern. Commented Jan 17, 2015 at 4:58
  • i mean is there an spaces exists before <html ? Commented Jan 17, 2015 at 4:59

1 Answer 1

3

Use the below regex and then get the attribute value pair from group index 1 and 3.

(?:<html|(?<!^)\G)\h*(?:([^=\n\h]+)=(['"])((?:\\\2|(?!\2).)*)\2)?

\G reference.

DEMO

Sign up to request clarification or add additional context in comments.

11 Comments

Thanks. But I want to match <html> without attributes (the first case) too. I want to manipulate especially the values of the attributes class and lang and leave the other attributes as they are. If there is class, I want to append its value and if there is no class, I want to add class.
Sir, how to be a regex master like you, please give some advise. @AvinashRaj
@AvinashRaj Awesome! Upvoted. It is better to extract the attribute name and value separately so that the further process on the result could be easy without using string manipulation.
@BeingHuman it's so simple. I started to learn regex (80%) a few months before (approx 10 months). This site teaches me about regex. People here a really awesome. I learned few things from the regex gurus (from SO). Some people clears my doubts. I want to be a person from that some people. Yep, you could ask me any doubts in regex by commenting below my posts.
@Sithu get the attribute from index 1 and value from 3 regex101.com/r/kG5vF1/4
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.