1

I'm in need of a quick way to put a bunch of html attributes in a Dictionary. Like so

<body topmargin=10 leftmargin=0 class="something"> should amount to

attr["topmargin"]="10"
attr["leftmargin"]="0"
attr["class"]="something"

This is to be done server-side and the tag contents are already available. I just need to weed out the tags with no value and take into account different quotation marks or lack of.

I'm guessing regex should be employed. Found some similar questions, but none that really match my need.

Thanks

edit: clarifying server-side

2 Answers 2

3

What about HtmlAgilityPack?

Sign up to request clarification or add additional context in comments.

7 Comments

What about it? I don't want a new framework or html parser for this one task that I know a nice regex can solve. Only that I still suck in regex after all these years.
@danijels - It is notoriously difficult to use a regular expression to parse HMTL. I would strongly suggest that you consider this answer. (+1 by the way)
You're going to spend a lot of time trying to get regex working, but a library like this is probably the best route. Especially considering how malformed most HTML sources can get.
Regexps are not great for parsing XML-like stuff. The attributes can bein arbitrary orders, and are optional. The formats don't have to be on one line. Sometimes its better to use a parser that really understands hwat its reading.
+1 I'm curious about the aggregated rep value that was generated in this one year of SO by "Use an actual parser" answers to "Which regex to parse HTML?" questions.
|
0

I also think that using specialized parsers will be better, but if you want to use regex, try something like:

\<(?<tag>[a-zA-Z]+)( (?<name>\w+)="?(?<value>\w+)"?)*\>

I just tested it, works pretty well

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.