2

I need to get string from comment in HTML file, I was trying to do it with DOM, but I didn't find good solution with this method.

So I want to try it with regular expressions, but I can't find satisfactory solution. Please, can you help me?

This is what I need:

<!--adress-"String here I need to get"-->

Thanks in advance for answer

4
  • HTML isn't a regular language. It cannot be correctly parsed with regular expressions. Commented Oct 2, 2011 at 18:38
  • 1
    Take a look here :) stackoverflow.com/questions/1732348/… Commented Oct 2, 2011 at 18:39
  • This may help simplehtmldom.sourceforge.net/manual.htm Commented Oct 2, 2011 at 18:39
  • @Mark: you shouldn't parse HTML with a Regex. However, the argumentation that HTML is not a regular language is usually bogus (since regular expressions are rarely regular, in any existing implementation) Commented Oct 2, 2011 at 18:40

3 Answers 3

4

Look into $matches after this code

preg_match('~<!--adress-"(.*?)"-->~msi', $string, $matches);
Sign up to request clarification or add additional context in comments.

7 Comments

Does anybody care to explain his/her downvote so I can improve my answer?
I don't know who downvoted you, but it's probably a knee-jerk reaction to your writing a regular expression to "parse HTML". I think it's just wrong of him or her, though, because I don't see why a regular expression can not be used to extract HTML comments.
One thing: Not all HTML comments are matched by your regexp. Comments are delimited by -- within the markup declaration open delimiter (<!) and the markup declaration close delimiter (>).
@DanielTrebbien: I just followed OP's needs
@DanielTrebbien according the the specs, <!-- starts a comment while --> ends it, so I fail to see what's wrong with that answer?
|
1

HTML comments are regular; you can just match <!--adress-"([^">]+)"--> and get the first group.

This assumes that the comments are always well-formed and always have a quoted string containing no quotes.

Comments

1

It will be more accurate:

$regex = '<!--(.+?)-"{0,1}(.+?)"{0,1}-->';
preg_match_all($regex, $html, $matches_array);

Just do the var_dump($matches_array) and see results.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.