1

I am trying to figure out how to extract a date from a string using a user defined pattern. The pattern could read many different ways, such as: Y-m-d, d/m/Y, (m/d/Y), [d/m/Y], etc...

The string that contains the parsed date in it, is a regular block of text, and is parsed using the defined pattern (similar to what is described above). For example, the dates inside the string would look something like the following, depending on the user-defined pattern: 2014-04-25, 04/25/2014, 25/04/2014, (04/25/2014), [25/04/2014], etc.

Is there a way to use the user-defined pattern to extract the actual date from a string? I would hope some sort of regex could do the job, but so far I am a stuck on this issue.

6
  • You need a table that translates each special character in the pattern to a regular expression, e.g. Y is \d{4}. Commented Apr 25, 2014 at 20:16
  • Ty Barmar, that makes sense, similar to answer below as well. I'll test this and report back. Commented Apr 25, 2014 at 20:20
  • @Slickrick12 Can't you control and limit the user input ? I don't think you'll find a one fits all regex in this case, due to the confusion between month and day, ex: 04/04/2014 and 04/04/2014 which one is the day or month ? Commented Apr 25, 2014 at 20:49
  • @Tuga: for 04/04 it is not very important to know that! :) Commented Apr 25, 2014 at 21:06
  • @CasimiretHippolyte Why it's not important to differentiate between month and day ? Commented Apr 25, 2014 at 21:09

3 Answers 3

1

You could force a pattern strategy that year is always 4 characters etc.. YYYY-MM-DD or str_replace() to get one that you want. You'll probably need to use uppercase or the \d will get replace as it will see the d. Or better, force the pattern to be uppercase:

$pattern = 'Y-M-D';
$pattern = str_replace(array('Y','M','D'), array('\d{4}','\d{1,2}','\d{1,2}'), strtoupper($pattern));
$pattern = preg_quote($pattern, '#');
preg_match("#$pattern#", $string, $match);

print_r($match);
Sign up to request clarification or add additional context in comments.

3 Comments

The pattern is already in place, so creating a regex out of the special characters as both you and Barmar pointed out makes sense. I'll test this and report back.
I didn't downvote it, not sure who did. I'll use stroupper on the pattern to use upper case as you suggest.
Added preg_quote() to be safe.
0

Create a mapping that converts the date pattern into a regular expression and then use preg_match_all() to extract all the matching dates from the given string:

function extractDates($text, $pattern) 
{
    $mapping = [
        'y' => '\d{4}',
        'm' => '\d{2}',
        'd' => '\d{2}',
    ];

    $regex = strtr(strtolower($pattern), $mapping);

    if (preg_match_all("~$regex~", $text, $matches)) {
        return $matches[0];
    }

    return false;
}

Test cases:

$testcases = [
    'foo 2014-04-25 bar'   => 'y-m-d',
    'foo 25/04/2014 bar'   => 'd/m/y',
    'foo 04/25/2014 bar'   => 'm/d/y',
    'foo [2014-04-25] bar' => 'y-m-d',
    'foo (25/04/2014) bar' => 'd/m/y',
    'foo [04/25/2014] bar' => 'm/d/y',
];

foreach ($testcases as $text => $pattern) {
    echo extractDates($text, $pattern)[0], PHP_EOL;
}

Output:

2014-04-25
25/04/2014
04/25/2014
2014-04-25
25/04/2014
04/25/2014

Demo

1 Comment

Thanks this worked best for what I needed. I ended up just using preg_match instead of preg_match_all, but aside from that it was fine.
0

It can be done using predefined symbols that the user can use to define a format of their choice. You can then use the symbols to safely create a regular expression:

function findDates($haystack, $format) {

    // Symbol to regex table
    // Change this to suit how you want the symbols
    // to be matched
    static $table = [
        'D' => '(?<!\d)(?:0[1-9]|[12]\d|3[01])(?!\d)',
        'd' => '(?<!\d)(?:[1-9]|[12]\d|3[01])(?!\d)',
        'M' => '(?<!\d)(?:0[1-9]|1[012])(?!\d)',
        'm' => '(?<!\d)(?:[1-9]|1[012])(?!\d)',
        'Y' => '(?<!\d)(?:\d{4})(?!\d)',
        'y' => '(?<!\d)(?:\d{2})(?!\d)',
    ];
    // Escape any special characters in the format, so
    // that it can be used for the regular expression
    $format = preg_quote($format, '/');

    // Create the regex by replacing symbols with their
    // corresponding regex
    $regex = str_replace(array_keys($table), array_values($table), $format);

    // Attempt to find dates
    preg_match_all("/{$regex}/", $haystack, $matches);

    // Return matches; if there were no matches then
    // then return false instead
    return $matches[0] ?: false;
}
$text = 'It happens on either 18/9 2015 or 8/10 2015.';
$findFormat = 'd/m Y';
var_dump(findDates($text, $findFormat));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.