0

Hi I'm trying to parse a sub string with php preg_match.

String input like :

25k8cp1gl6-Mein Herze  im Blut, BWV 199: Recitative: Ich  Wunden_SVD1329578_14691639_unified :CPN_trans:

Here I want to extract Mein Herze im Blut, BWV 199: Recitative: Ich Wunden

25k8cp1gl6-La Puesta Del Sol_SVD1133599_12537702_unified :CPN_trans:

Here I want to extract La Puesta Del Sol

La Puesta Del Sol_SVD1133599_12537702_unified :CPN_trans:

Here I want to extract La Puesta Del Sol

25k8cp1gl6-La Puesta Del Sol_MNA1133599_12537702_unified :CPN_trans:

Here I want to extract La Puesta Del Sol

25k8cp1gl6-La Puesta Del Sol_IMC1133599_12537702_unified :CPN_trans:

Here I want to extract La Puesta Del Sol

So basically I want to extract the string before _SVD or _MNA and _IMC excluding the first part of the string 25k8cp1gl6-

Thanks in Advance

1 Answer 1

2

Here is an expression for ya:

(?<=25k8cp1gl6-).*?(?=_(?:SVD|MNA|IMC))

Explanation:

(?<=...) is syntax for a lookahead, meaning we start by finding (but not including in our match) "25k8cp1gl6-". Then we lazily match our entire string with .*?. Finally, (?=...) is a lookahead syntax. We look for "_" followed by "SVD", "MNA", or "IMC" (separated with | in the non-capturing group (?:...)).

PHP:

$strings = array(
    '25k8cp1gl6-Mein Herze  im Blut, BWV 199: Recitative: Ich  Wunden_SVD1329578_14691639_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_SVD1133599_12537702_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_MNA1133599_12537702_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_IMC1133599_12537702_unified :CPN_trans:',
);

foreach($strings as $string) {
    if(preg_match('/(?<=25k8cp1gl6-).*?(?=_(?:SVD|MNA|IMC))/', $string, $matches)) {
        $substring = reset($matches);
        var_dump($substring);
    }
}

Another option, which would use preg_replace(), is demoed here:

^\w+-(.*?)_(?:SVD|MNA|IMC).*

Explanation:

This one matches the entire string, but captures the part we want to keep so that we can reference it in our replacement. Also note that I began with ^\w+- instead of 25k8cp1gl6-. This pretty much just looks for any number of "word characters" ([A-Za-z0-9_]) followed by a hyphen at the beginning of the string. If it needs to be "25k8cp1gl6-", you can replace this; I just wanted to show another option.

PHP:

$strings = array(
    '25k8cp1gl6-Mein Herze  im Blut, BWV 199: Recitative: Ich  Wunden_SVD1329578_14691639_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_SVD1133599_12537702_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_MNA1133599_12537702_unified :CPN_trans:',
    '25k8cp1gl6-La Puesta Del Sol_IMC1133599_12537702_unified :CPN_trans:',
);

foreach($strings as $string) {
    $substring = preg_replace('/^\w+-(.*?)_(?:SVD|MNA|IMC).*/', '$1', $string);
    var_dump($substring);
}
Sign up to request clarification or add additional context in comments.

7 Comments

First part contains some Alphanumeric characters, so I changed it to (?<=[a-zA-Z0-9_]-)(.*?)(?=_(?:SVD|MNA|IMC)) It's working, Thanks a lot Sam!
No problem, I added a second alternative. Also note that [a-zA-Z0-9_] is equivalent to \w.
It gives the required sub string correctly if the first part "25k8cp1gl6-" is present, but for my case first part is optional.
I completely missed that, check out: (?:\w+-)?\K.*?(?=_(?:SVD|MNA|IMC))..I can update/explain if you need
This works great, how about extracting the SVD1133599_12537702 part or IMC1133599_12537702 or MNA1133599_12537702 or SVD1133599 part or IMC1133599 or MNA1133599. here please note that "_12537702" is optional
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.