1

I have a (strange) string like:

EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR

The pattern I need to look for can only be defined by keywords: EREF+, MREF+, CRED+ and others. I know there are 19 keywords, but the string may contain different subsets of these 19 keywords. I don't know if the order stays the same, from what I can tell EREF+ will most likely be the first keyword, but the order may as well differ. I also don't know which of the 19 keywords might be the last one in the string as that may change case by case.

My first approach was to just use explode() twice, with keyword 1 and keyword 2 – but if the keywords change order (and I cannot guarantee they don't) I would have to go through all possible combinations.

Anyway, here's the first (working) code I used:

<?php 

$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";

function getBetween($content,$start,$end){
    $r = explode($start, $content);
    if (isset($r[1])){
        $r = explode($end, $r[1]);
        return $start.$r[0];
    }
    return '';
}

$start = "EREF+";
$end = "MREF+";
$output = getBetween($string,$start,$end);
echo $output;

?>

So now I am looking into regex to come up with a solution that extracts a substring between two keywords, where any of the keywords can be the start delimiter while any other keyword may be the end delimiter.

Since there are literally thousands of regex questions around, I took some time and tried to adapt from other solutions, but no success until now. I must confess regex is voodoo to me and I cannot seem to remember the patterns for more than a minute. I found this thread which is pretty close to what I am trying to achieve, and tried a few tweaks but I cannot get it to work properly.

Here's my code so far:

<?php 

$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";

$matches = array();
$keywords = ['EREF+', 'MREF+', 'CRED+', 'SVWZ+', 'ABWA+'];
$pattern = sprintf('/(?:%s):(.*?)/', join('|', array_map(function($keyword) {
    return preg_quote($keyword, '/');
}, $keywords)));

preg_match_all($pattern, $string, $matches);

print_r($matches);

?>

... whereas the constructed pattern looks like this:

/(?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+):(.*?)/

Can anyone advise please? Any help appreciated!

Thanks

3
  • Do you need to know which keyword caused the split? Maybe preg_split? eval.in/656629 Commented Oct 6, 2016 at 20:52
  • You are right, I actually do need to know which keyword caused the split. Didn't think about that yet. Commented Oct 6, 2016 at 21:08
  • +1 for the preg_split approach. With the help of this comment here and that comment there I've managed to fork your code to include the keywords that caused the split as keys in an associative array: eval.in/656679 Commented Oct 6, 2016 at 21:46

1 Answer 1

1

You can use this regex:

/(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+)(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/

It will match the strings between defined keywords.

(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+) # look backward for a keyword
(.+?) #Match any character, non greedy
(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$) # Look forward for a keyword or end of string

Regex101

Edit: If you want to know what keyword caused the split you can use this regex:

/((?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+))(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/

It will capture the first keyword and the text between keywords.

Live sample

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the quick respone! But unfortunately that will not get the last occurence of a keyword, in my example ABWA+. Any idea how to deal with that?
the last one isn't between keywords, but you can put $ as an option in the look foward, I'll update the answer
Wow, works like a charm! Thanks! @chris85 brought up a thought that I missed yet. How do I know which of the keywords actually caused the splitting of the string? As far as I see this is not possible with the regex, right?
I included the keyword that caused the split in the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.