3

I'm trying to write a (I think) pretty simple RegEx with PHP but it's not working. Basically I have a block defined like this:

%%%%blockname%%%%
stuff goes here
%%%%/blockname%%%%

I'm not any good at RegEx, but this is what I tried:

preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/i',$input,$matches);

It returns an array with 4 empty entries.

I guess it also, apart from actually working, needs some sort of pointer for the third match because it should be equal to the first one?

Please enlighten me :)

2
  • 1
    If you don't have nested blocks, you don't need to worry about the third match matching the first. On the other hand, if you do have nested blocks, regular expressions may not be the way to go.. Commented Jun 10, 2011 at 7:40
  • I don't have nested blocks right now, but might in the future. I've also thought about maybe using a HTML parser instead and defining the blocks by giving attributes to my HTML code. Commented Jun 10, 2011 at 7:41

2 Answers 2

8

You need to allow the dot to match newlines, and to allow ^ and $ to match at the start and end of lines (not just the entire string):

preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/sm',$input,$matches);

The s (single-line) option makes the dot match any character including newlines.

The m (multi-line) option allows ^ and $ to match at the start and end of lines.

The i option is unnecessary in your regex since there are no case-sensitive characters in it.

Then, to answer the second part of your question: If blockname is the same in both cases, then you can make that explicit by using a backreference to the first capturing group:

preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/\1%%%%$/sm',$input,$matches);
Sign up to request clarification or add additional context in comments.

6 Comments

Good point, though this isn't really an answer to Kokos' question.
I suppose \1 then refers to the first match, learn something every day :)
\n refers to the contents of the nth capturing group (set of parentheses) in a regex. In another comment you mentioned that you might have nested blocks in the future. This is where it gets complicated. It can be done, but it's hairy to say the least.
I've found the problem with my HTML input, the %%%%blockname%%%% was indented so I guess ^ didn't allow it to be matched because it wasn't the first thing on the line.
In that case, just add \s* after the ^ and/or before the $.
|
0

I'm pretty sure you can't since these operations would need to save a variable and you can't in regex. You should try to do this using PHP's built-in token parser. http://php.net/manual/en/function.token-get-all.php

4 Comments

what do you mean you can't save a variable in regex? I don't think I'm missing something when I say $matches will contain what is matched.
$matches is PHP. But if you wan't regex to match open and closing tags it would have to save the first tag and search to only a matching closing tag (instead of just any closing tag).
I'm not sure if I'm misunderstanding you, but the answer Tim Pietzcker gave does allow me to match opening and closing tags within a single RegEx (and I can't see why it shouldn't be possible in the first place).
Please re-read the question. He's asking for a regex that can detect a matching (nested) closing-tag. (example: <a> ... </a>, <a> ... <b> ... </b> ... </a>, would give a ... b instead of b ... b ... a ... a)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.