1

I'm trying to search for a number/character pattern inside of a string.

The String can look like this

"any text CA-2019-6-000000 any text" 
"any text KA 2019-2-929029" // note: no "-" between the "KA and 2019" 
"KA-2019-11-929029" 

What I can definetly say, There is always a year, like 2000/2019/2055. After the year is always a minus sign, two possible numbers from 1-12 and another minus sign.

Which is the month.

And after the "-<num>-", is a 6 digits long number which can be at least 000000 or max 999999

Before the year, can be max. two characters long string followed by a minus. Between this two characters long strong could be a minus sign or a space letter.

Examples:

"AA 2019"
"ZZ-2018"

I found out that I could get the 6 numbers with /[0-9]{6}/.

The year by /[0-9]{4}. I would like to add, that it only can be between 2000 and 2100

And I can get the number between the two minus signs with: /(?<=\-)(.*?)(?=\-)/ or

/\-(.*?)\-/


For example, I had the idea to look for the number between the two "-" characters and store it in a variable. Then to say I want to have the numbers between the ones after this variable that are 6 characters long and between 000000 and 999999.

A similar game with that year. I'd want to say I'd get the number which is before the variable with the "--" The maximum length is 4 numbers and is between 2000 and 2100.

If I then have stored the year in a variable I can theoretically say I'm looking for two letters which precede "-$yearvariable" or " [space]$yearvariable

3
  • Please give a proper example of what you actually want as result here, based on that shown input data. Commented Jul 1, 2019 at 12:07
  • Oh as a result I want to have the thos example options at the top of the questions. Like KA-2019-11-929029 or CA 2019-6-000000 . Commented Jul 1, 2019 at 12:09
  • Like the text which could possible be after and before this code, should be removed and I want the code which starts with the two characters and ends with the 6 letters. Commented Jul 1, 2019 at 12:12

1 Answer 1

1

You may use

\b([A-Z]{2})[-\s](20[0-9]{2}|2100)-(0?[1-9]|1[0-2])-(\d{6})(?!\d)

See the regex demo

Details

  • \b - word boundary
  • ([A-Z]{2}) - two uppercase letters
  • [-\s] - a hyphen or whitespace
  • (20[0-9]{2}|2100) - number from 2000 to 2100
  • - - a hyphen
  • (0?[1-9]|1[0-2]) - a month from 1 to 12
  • - - a hyphen
  • (\d{6})(?!\d) - 6 digits (no 7 or more).

See the regex graph:

enter image description here

enter image description here

PHP demo:

$s = "any text CA-2019-6-000000 any text";
if (preg_match('~\b([A-Z]{2})[-\s](20[0-9]{2}|2100)-(0?[1-9]|1[0-2])-(\d{6})(?!\d)~', $s, $m)) { 
    print_r($m);
}

Output:

Array
(
    [0] => CA-2019-6-000000
    [1] => CA
    [2] => 2019
    [3] => 6
    [4] => 000000
)
Sign up to request clarification or add additional context in comments.

1 Comment

@blyadjs I thought you also needed all those parts of the regex match, so I used capturing groups ((...)). If you do not need them, turn all the capturing groups into non-capturing by replacing (...) with (?:...). Also, use this generator for numeric ranges, it looks helpful (though sometimes, the patterns are a bit redundant).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.