I have to process text that comes from student essays (texts can be VERY large).
I need in PHP a preg_match for dates inside that strings which may come in this way:
...blah blah blah (1994) blah blah blah ...
...blah blah blah (nov-1994) blah blah blah ...
...blah blah blah (november-1994) blah blah blah ...
...blah blah blah (1994-nov) blah blah blah ...
...blah blah blah (1994-november) blah blah blah ...
The dates in the strings may come with '( )' or with '[ ]'
I have done it this way:
if (preg_match('/\w{0,8}-?(19|20)\d{2}-?\w{0,8}/', $string, $s)) {
# code
}
which is right and do its job but its capturing some unrelated strings like
... blah blah blah (SKU_1956) blah blah blah ...
... blah blah blah [INFERNO2000] blah blah blah ...
... blah blah blah [like-2000-me] blah blah blah ...
I dont seem to be able to do it, so I need help to fine-tuning this regexp to only capture if
- start with either ( [
- may be a single word and if it exists, MUST end in -
- MUST BE a year in the lap 19xx-20xx
- may be a single word and if it exists, MUST start with -
- end with either ) ]
The word is limited to 8 chars because of the longest month (like december)
There is a huge amount of non-related strings captured, thats why I want to fine-tuning it.
[([]((\w{1,8})-((?:20|19)\d{2})|(?3)-(?2)|(?3))[])]