1

i am currently working on a script to match IT eqipment models from different suppliers, the idea is to remove the -XXX numbers at the end, the ending P or a P- in the middle of the name example models are

DH-HAC-HDBW3802EP-Z     HAC-HDBW3802E-Z     
DH-HAC-HDBW3802EP-ZH    HAC-HDBW3802E-ZH        
DH-HAC-HDW1000MP-028    HAC-HDW1000M        
DH-HAC-HDW1000RP-028    HAC-HDW1000R        
DH-HAC-HDW1100EMP-02    HAC-HDW1100EM       
DH-HAC-HDW1100EMP-03    HAC-HDW1100EM       
DH-HAC-HDW1100MP        HAC-HDW1100M        
DH-HAC-HDW1100MP-036    HAC-HDW1100M        
DH-HAC-HDW1100RP-028    HAC-HDW1100R        
DH-HAC-HDW1100RP-VF     HAC-HDW1100R-VF

for now i am using a rather complicated code that i must admit, does work but i have a deep inside urge to regex it a little * i know, if it works, don't mess with it* The function to clean the endings of the names is looking like

function beautifyDahua($text)
{
    $text = str_replace('DHI-', '', $text);
    $text = str_replace('DH-', '', $text);

    if (empty($text)) {
        return 'n-a';
    }

//if begins with IPC sau HAC, clean further

 elseif (substr( $text, 0, 4 ) === "IPC-" OR substr( $text, 0, 4 ) === "HAC-") {

    $text = str_replace('AP-028', 'A', $text);
    $text = str_replace('AP-036', 'A', $text);
    $text = str_replace('AP', 'A', $text);
    $text = str_replace('BP-028', 'B', $text);
    $text = str_replace('BP-036', 'B', $text);
    $text = str_replace('BP', 'B', $text);
    $text = str_replace('CP-', 'C-', $text);
    $text = str_replace('DP-036', 'D', $text);
    $text = str_replace('DP-', 'D-', $text);
    $text = str_replace('EMP-03', 'EM', $text);
    $text = str_replace('EMP-02', 'EM', $text);
    $text = str_replace('EMP-', 'EM-', $text);
    $text = str_replace('EP-036', 'E', $text);
    $text = str_replace('EP-028', 'E', $text);
    $text = str_replace('EP-03', 'E', $text);
    $text = str_replace('EP-02', 'E', $text);
    $text = str_replace('EP-', 'E-', $text);
    $text = str_replace('EP', 'E', $text);
    $text = str_replace('FP-03', 'F', $text);
    $text = str_replace('FP-02', 'F', $text);
    $text = str_replace('FP-', 'F-', $text);
    $text = str_replace('FP', 'F', $text);
    $text = str_replace('RMP-03', 'RM', $text);
    $text = str_replace('RMP-02', 'RM', $text);
    $text = str_replace('RMP-', 'RM', $text);
    $text = str_replace('RMP', 'RM', $text);
    $text = str_replace('RP-028', 'R', $text);
    $text = str_replace('RP-036', 'R', $text);
    $text = str_replace('RP-', 'R-', $text);
    $text = str_replace('RP', 'R', $text);
    $text = str_replace('SP-036', 'S', $text);
    $text = str_replace('SP-028', 'S', $text);
    $text = str_replace('SP-', 'S-', $text);
    $text = str_replace('SP', 'S', $text);
    $text = str_replace('SLP-03', 'SL', $text);
    $text = str_replace('TP-', 'T-', $text);
    $text = str_replace('MP-036', 'M', $text);
    $text = str_replace('MP-028', 'M', $text);
    $text = str_replace('MP', 'M', $text);
    return $text;
}
 else {

    return $text;
}
}

For the numbers i have a regex like \b-0(\d|\d\d)\b But for the P situation i am in over my head.

Any advice on how to tackle this?

1
  • 1
    If you can describe the complex rules in a simple sentence the solution should be simple as well. Is this a matter of matching everything after the last number before the last - with something else (i.e. the first character matched?) Commented Mar 11, 2018 at 9:42

4 Answers 4

1

Your regex \b-0(\d|\d\d)\b for the numbers can be written as -0\d{1,2}. For this match I don't think you need the word boundaries \b.

Try it like this:

(?:DHI?-)?(?:IPC|HAC)-HDB?W\d+[A-Z]+\K(?:P-0\d{1,2}|P)

The regex uses \K to reset the starting point of the reported match and matches what comes after. Then you could replace the selected match with an empty string.

Explanation

  • (?: Non capturing group
    • DHI?- Match DH with optional captital I
  • )? Close non capturing group
  • (?: Non capturing group
    • IPC|HAC Match IPC or HAC
  • ) Close non capturing group
  • -HDB?W Match dash HD, optional B and W
  • \d+ Match one or more digits
  • [A-Z]+ Match one or more uppercase characters
  • \K Reset starting point of the reported match
  • (?: Non capturing group (This will contain your match)
    • P- Match P-
    • 0\d{1,2} Match 0 and 2 digits (or \d{2,3} to match 2 or 3 digits)
    • | Or
    • P Match P
  • )Close non capturing group

Demo php

Sign up to request clarification or add additional context in comments.

Comments

0

Here is the regular expression I propose you:

Pattern:     (?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)
Replacement: \1

and his PHP implementation using the preg_replace function:

$text = 'DH-HAC-HDW1000MP-028';            
$result = preg_replace('/(?:DHI?-)?((?:HAC|IPC)-[A-Z0-9]+)(?:P-\d+|P)/', '$1', $text);
echo $result; // HAC-HDW1000M

You can see a working demo by visiting this link.

Comments

0

Here's something but not sure if it will work for you:

preg_replace("/\b(DH-)?(HAC-)(\w+\d+)(\w)(\w*)(-?\d+)?/", "$2$3$4", $input_lines);

So basically it matches words with an optional DH- followed by HAC- followed by any number of letters followed by any number of digits, following by letters (at least 2 optionally followed by -numbers

Here's a bit of a hacky part, because the end optionally matches -\d+ but does not use it in the replacement it will strip that out but it does not match -\w so if trailing characters exist they will be kept. However this will fail if this is part of a sentence.

1 Comment

Hello and thank you for the quick answer, i used your proposal and it still had some minor problems, like not matching replacing EMP with E instead of EM but i used $text = preg_replace("/\b(DHI-|DH-)?(HAC-|IPC-)(\w+\d+)(\w(M|L)?)(P)(\w*)(-?\d+)?/", "$2$3$4", $text); $text = preg_replace("/\b(DHI-|DH-)?/", "", $text);
0

After messing around with @apokryfos solution i came to

$text = preg_replace("/\b(DHI-|DH-)?(HAC-|IPC-)(\w+\d+)(\w(M|L)?)(P)(\w*)(-?\d+)?/", "$2$3$4", $text);
$text = preg_replace("/\b(DHI-|DH-)?/", "", $text);

But i see that Thomassos solution works out of the box, i will have to check both in the 1200+ examples i have and see wich one works best in my case, anyways, thank you alot for your support.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.