Are there any real differences between Ruby regex and Python regex?
I've been unable to find any differences in the two, but may have missed something.
Are there any real differences between Ruby regex and Python regex?
I've been unable to find any differences in the two, but may have missed something.
The last time I checked, they differed substantially in their Unicode support. Ruby in 1.9 at least has some very limited Unicode support. I believe one or two Unicode properties might be supported by now. Probably the general categories and maybe the scripts were the two I'm thinking of.
Python has less and more Unicode support at the same time. Python does seem to make it possible to meet the requirements of RL1.2a "Compatability Properties" from UTS#18 on Unicode Regular Expressions.
That said, there is a really rather nice Python library out there by Matthew Barnett (mrab) that finally adds a couple of Unicode properties to Python regexes. He supports the two most important ones: the general categories, and the script properties. It has some other intriguing features as well. It deserves some good publicity.
I don't think either of Ruby or Python support Unicode all that terribly well, although more and more gets done every day. In particular, however, neither meets even the barebones Level 1 requirement for Unicode Regular Expressions cited above. For example, RL1.2 requires that at least 11 properties be supported: General_Category, Script, Alphabetic, Uppercase, Lowercase, White_Space, Noncharacter_Code_Point, Default_Ignorable_Code_Point, ANY, ASCII, and ASSIGNED.
I think Python only lets you get to some of those, and only in a roundabout way. Of course, there are many, many other properties beyond these 11.
When you’re looking for Unicode support, there's more than just UTS#10 on Regular Expressions of course, although that is the one that matters most to this question and neither Ruby nor Puython are Level 1 compliant. Other very important aspects of Unicode include UAX#15, UAX#14, UTS#18, UAX#11, UAX#29, and of course the crucial UAX#44. Python has libraries for at least a couple of those, I know. I don't know that they're standard.
But when it comes to regular expression support, um, there are richer alternatives than just those two, you know. :)
\p{lower} are in radical conflict with the Unicode Standard, which says it must be all lowercase, not just letters. Beyond that, only two properties are supported: General_Category and Script properties. There’s no support for grapheme clusters via \X or equiv. There’s no \N{NAME} support. It’s missing the rest of the stuff for Level 1, the lowest acceptable level of Unicode support.NFKD($s) =~ / (?= a \p{Grapheme_Extend}* \N{COMBINING CIRCUMFLEX ACCENT} ) \X /ix. How am I do that in Ruby? Neither Ruby nor Python can even come close to meeting the MINIMAL requirements of UTS#18 on Unicode Regexes. See now?I like the /pattern/ syntax in Ruby, inspired from Perl, for regular expressions. Python's re.compile("pattern") is not really elegant for me. The syntatic sugar in Ruby and the fact that regular expressions are a separate re module in Python, makes me lean towards Ruby when it comes to Regular Expressions.
Apart from this, I don't see much of a difference from a normal Regular Expression programming perspective. Both the languages have pretty comprehensive and mostly similar RE support. There might be performance differences ( Python traditionally has has better performance ) and also Python has greater unicode regular expressions support.
\X or perhaps through \p{Grapheme_Base}\p{Grapheme_Extend}*? Does it do full 1:many Unicode case folding for case insensitive matches? Can you reliably use any possible Unicode code point, or are you still hamstrung by that BMP restriction (which Unicode forbids, ahem)? BTW, I’m just ribbing you, don’t take it too seriously.If the question is only about regex's: neither. Use Perl.
You should choose between those languages based on the other non-regex issues that you are trying to solve and the community support in that language that is nearby your field of endeavor.
If you are truly only picking a language based on regex support -- choose Perl...
Ruby's Regexp#match method is equivalent to Python's re.search(), not re.match(). re.search() and Regexp#match look for the first match anywhere in a string. re.match() looks for a match only at the beginning of a string.
To perform the equivalent of re.match(), a Ruby regular expression will need to start with a ^, indicating matching the beginning of the string.
To perform the equivalent of Regexp#match, a Python regular expression will need to start with .*, indicating matching zero or more characters.
The regular expression libraries for Ruby and Python are developed by two completely independent teams. Even if they are identical now (and I wouldn't be certain they are), there's no guarantee that they won't diverge sometime in the future.
The safest position is to assume they're different now, and assume they will continue to be different in the future.