1

I'm trying to select all the rows in a table from a database that has a the following structure:

<tr>
<td>
<p><strong>Completion Date:</strong></p>
</td>
<td>
<p>April, 2012</p>
</td>
</tr>

But the month and year could be different.

Here is my current query statement:

SELECT * FROM `posts` WHERE `content` REGEXP "<tr>\r\n<td>\r\n<p><strong>Completion Date:</strong></p>\r\n</td>\r\n<td>\r\n<p>April, 2012</p>\r\n</td>\r\n</tr>"

Currently this will only pull the rows that have April, 2012 which is what I expect it to pull. I tried replacing the month with: ^[A-Za-z]$ but this did not work nor any other combination I tired.

Could someone help with the correct regular expression?

Thanks,

-M

2
  • Use [A-Za-z]* without carret and dollar sign to match it. Commented Jul 25, 2013 at 18:44
  • 2
    I know this isn't an answer, but ewww! That's a horrible way to store data in a database. Commented Jul 25, 2013 at 18:48

3 Answers 3

1

This should give the results you are looking for. Note how you need the star, which means 0 or more [a-zA-Z], and zero or more [0-9] characters.

SELECT * FROM `posts` WHERE `content` REGEXP "<tr>\r\n<td>\r\n<p><strong>Completion Date:</strong></p>\r\n</td>\r\n<td>\r\n<p>[a-zA-Z]*, [0-9]*</p>\r\n</td>\r\n</tr>"

The caret ^ and dollar sign $ match the beginning and the end of the string. Since the date is not at the beginning, these would not match for you.

Good luck.

Sign up to request clarification or add additional context in comments.

3 Comments

Oh I see interesting! Thanks!
Yes, the question marks don't belong there; I've edited the answer to remove them. (In PCRE-style regexps, *? means the same thing as * except that it tries to match as few characters as possible, but as far as I can tell, MySQL's regexp engine doesn't support such ungreedy matching. In any case, even if it worked, it would make no difference here.)
Ah I've been in the habit of using non-greedy * matches in perl style. Thanks for the info.
0

^[A-Za-z]$ will match a single alphabetic character on a line (^ for beginning of string, $ for end of string).

You might have more luck with something like: [A-Z][a-z]*,\s*[0-9]{4}. To explain:

  • [A-Z] - 1 capital letter
  • [a-z]* - any number of lowercase letters (including 0 of them)
  • , - a comma
  • \s* - any number of spaces (including 0 of them)
  • [0-9]{4} - exactly 4 digits

Comments

0

If it's just month and year you can do:

[a-zA-Z]+, \d{4}

Granted, this will take ANY word and ANY 4 digit year. If you wanted to add more checks, it could be:

(January|February|March|April|May|June|July|August|September|October|November|December), (19|20)\d{2}

This new regex would match from the possible valid Months, and also check if the year is 19xx or 20xx.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.