3

I'm trying to ensure that a string in PHP has only letters, hyphens or apostraphes. To accomplish this I wanted to make a range of valid characters using [ ]. So my idea was to do this:

[[A-Za-z]-'] // Weird syntax highlighting here

Will this work? Is it possible to nest brackets like that? This is meant to match a single character that is either a letter, a hyphen, or an apostraphe. I may be approaching the problem naively and that's OK, I just wanted to know if putting brackets within brackets like this is legal in PHP. Thanks!

2
  • I think you can also benefit from this nice overview of available regex design tools: stackoverflow.com/questions/89718/… Commented Apr 30, 2011 at 22:35
  • Thanks a lot! I had no idea tools like this existed. Commented Apr 30, 2011 at 22:39

4 Answers 4

3

I'm assuming you're using this in one of the regular expression matching functions (like preg_match("[[A-Za-z]-']*", ...), and in that case, it's a question not of PHP syntax, but regular expression syntax. And the answer is no, you can't nest brackets like that. If you want a regular expression that matches only a letter, hyphen, or apostrophe, use [A-Za-z'-]. (The hyphen goes last so that the regex engine knows that it's not representing a range of characters like A-Z. Alternatively you can escape the hyphen with a backslash, then you can put it anywhere: [A-Za-z\-'].)

Sign up to request clarification or add additional context in comments.

6 Comments

Ah, perfect. I was making it unnecessarily complicated! Thanks.
Just escape the hyphen. Always escape the hyphens, so you never forget to. Saves you an hour in a lifetime =)
@Rudie: good advice, I suppose. I've just gotten used to putting the hyphen at the end whenever I need it in a character class.
That works just as well. I was learned to always escape it. Just to be safe. In the end it doesn't matter I guess, as long as it works the way you want.
That’s not a letter. It’s A-Z.
|
1

I don't understand.

What's wrong with

[A-Za-z'-]

?

Comments

0
[\pL\p{Pd}'ʹ’]⁠ ⁠ 

That matches:

  • any Letter character⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠
  • any Dash Punctuation character
  • U+0027 APOSTROPHE (which is not the preferred form)
  • U+02B9 MODIFIER LETTER PRIME
  • U+2019 RIGHT SINGLE QUOTATION MARK

Comments

0

To ensure that a string contains only the desired characters you can do it two ways:

  • You know its good if all chars in the string are valid.
  • You know its bad if any one char in the string is invalid.

Here is a PHP snippet that demonstrates both methods:

// Method 1: Good if all chars in the string are valid.
$re_all_valid = '/^[A-Za-z\-\']*$/';
if (preg_match($re_all_valid, $text)) {
    echo("GOOD: String contains all valid characters.\n");
} else {
    echo("BAD: String does NOT contain all valid characters.\n");
}

// Method 2: Bad if any one char in the string is invalid.
$re_one_invalid = '/[^A-Za-z\-\']/';
if (preg_match($re_one_invalid, $text)) {
    echo("BAD: String contains one invalid character.\n");
} else {
    echo("GOOD: String does NOT contain one invalid character.\n");
}

Notes: Method 1 requires anchors at both ends of the string and a quantifier applied to the positive character class. Method 2 uses a negated character class and only needs to match one character in the string. Method 2 is likely more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.