1

i need help with an Reg. Ex. i have a long text with many whitespaces and new lines, i need to find and select ALL between 2 strings. example:

iojge test rgej <foo>
ferfe 098n34hjlrej
fefe <end

i want to find all between test and end:

 rgej <foo>
ferfe 098n34hjlrej
fefe <

how can i do this?

1
  • So, test and end can never be a part of the text you're trying to match? What if the string looks like: "test testing ending end"? Commented Sep 15, 2010 at 18:18

4 Answers 4

4

You can try

preg_match("/test(.*?)end/s", $yourString, $matches);
print_r($matches);
Sign up to request clarification or add additional context in comments.

5 Comments

The m flag will cause $ to match the end of the line and ^ match the start of a line: it will not let the DOT meta character match line breaks. This is done with the s flag.
This will capture test and end, which doesn't comply with the OP's sample.
@Daniel Vandersluis, Check $matches[1]
@Colin yeah I know that but the OP might not ;)
Yeah, and it is pretty clear that the desired match resides at index 1 after looking at the output print_r($matches); produces. I like this one better than the look-around suggestion. The readability of this answer is much better.
2

You can use two lookarounds and the /s (single line) modifier, which makes the dot match newlines, to look for everything between your two words:

/(?<=test).*(?=end)/s

To explain:

(?<=    # open a positive lookbehind
  test  # match 'test'
)       # close the lookbehind
.*      # match as many characters as possible (including newlines because of the \s modifier)
(?=     # open a positive lookahead
 end    # match 'end'
)       # close the lookahead

The lookarounds will let you assert that the pattern must be anchored by your two words, but since lookarounds are not capturing, only everything between the words will be returned by preg_match. A lookbehind looks behind the current position to see if the assertion passes; a lookahead looks after the current position.

Since regular expressions are greedy by default, the .* will match as much as it can (so if the ending word appears multiple times, it will match until the last one). If you want to match only until the first time it encounters end, you can make the .* lazy (in other words, it'll match as little as possible that still satisfies the pattern) by changing it to .*? (ie. /(?<=test).*?(?=end)/s).

2 Comments

To be on the safe side, I'd make it a reluctant DOT-STAR.
@Bart it depends on what the OP wants to capture. I've updated my answer to discuss that though.
1

Alternatively you can also do:

$arr1 = explode("test",$input);
$arr2 = explode("end",$arr1[1]);
$result = $arr2[0];

2 Comments

What if there is no test in $input?
@Gumbo: In that case the $result will be empty string. But I think there will be warnings of some invalid index. So you are right there needs to be some error checking.
0

If you have fixed delimiters, you don’t need regular expressions:

$str = 'iojge test rgej <foo>
ferfe 098n34hjlrej
fefe <end';
$start = 'test';
$end = 'end';
if (($startPos = strpos($str, $start)) !== false && ($endPos = strpos($str, $end, $startPos+=strlen($start))) !== false) {
    // match found
    $match = substr($str, $startPos, $endPos-$startPos);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.