Ruby Regex vs Python Regex

Question

Are there any real differences between Ruby regex and Python regex?

I've been unable to find any differences in the two, but may have missed something.

hmm? what are you trying to "find"? regex itself is a language, so the library might have a bit different flags but overall the syntax is the same between everything that supports it. — OneOfOne
– OneOfOne, Commented Apr 15, 2011 at 2:07
Rather, Ruby1.9 and PHP5 should be the same because they adopt the same oniguruma engine. — sawa
– sawa, Commented Apr 15, 2011 at 2:16
I'm pretty sure that neither has good regex debugging support. — tchrist
– tchrist, Commented Apr 15, 2011 at 3:44

dawg · Accepted Answer · 2019-10-13 18:50:14Z

8

The last time I checked, they differed substantially in their Unicode support. Ruby in 1.9 at least has some very limited Unicode support. I believe one or two Unicode properties might be supported by now. Probably the general categories and maybe the scripts were the two I'm thinking of.

Python has less and more Unicode support at the same time. Python does seem to make it possible to meet the requirements of RL1.2a "Compatability Properties" from UTS#18 on Unicode Regular Expressions.

That said, there is a really rather nice Python library out there by Matthew Barnett (mrab) that finally adds a couple of Unicode properties to Python regexes. He supports the two most important ones: the general categories, and the script properties. It has some other intriguing features as well. It deserves some good publicity.

I don't think either of Ruby or Python support Unicode all that terribly well, although more and more gets done every day. In particular, however, neither meets even the barebones Level 1 requirement for Unicode Regular Expressions cited above. For example, RL1.2 requires that at least 11 properties be supported: General_Category, Script, Alphabetic, Uppercase, Lowercase, White_Space, Noncharacter_Code_Point, Default_Ignorable_Code_Point, ANY, ASCII, and ASSIGNED.

I think Python only lets you get to some of those, and only in a roundabout way. Of course, there are many, many other properties beyond these 11.

When you’re looking for Unicode support, there's more than just UTS#10 on Regular Expressions of course, although that is the one that matters most to this question and neither Ruby nor Puython are Level 1 compliant. Other very important aspects of Unicode include UAX#15, UAX#14, UTS#18, UAX#11, UAX#29, and of course the crucial UAX#44. Python has libraries for at least a couple of those, I know. I don't know that they're standard.

But when it comes to regular expression support, um, there are richer alternatives than just those two, you know. :)

edited Oct 13, 2019 at 18:50

dawg

105k24 gold badges142 silver badges217 bronze badges

answered Apr 15, 2011 at 3:29

tchrist

80.7k31 gold badges135 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

steenslag Over a year ago

I think ruby regex support has become much more powerful since you last checked: github.com/ruby/ruby/blob/trunk/doc/re.rdoc

tchrist Over a year ago

@steenslag No, Ruby regexes still suck at Unicode. Charclass abbreviations are still pitifully out of step with RL1.2a, stuck in the ASCII sands of yesteryear. Same with the POSIX props. And things like \p{lower} are in radical conflict with the Unicode Standard, which says it must be all lowercase, not just letters. Beyond that, only two properties are supported: General_Category and Script properties. There’s no support for grapheme clusters via \X or equiv. There’s no \N{NAME} support. It’s missing the rest of the stuff for Level 1, the lowest acceptable level of Unicode support.

tchrist Over a year ago

@steenslag: Consider this totally reasonable, and indeed very commonly needed, pattern for matching a grapheme cluster—a user-perceived character—that has "a" and a circumflex, but where you do not know the normalization form first, where you want fullwidth "a"’s and such to match, and where other marks can fall between them: NFKD($s) =~ / (?= a \p{Grapheme_Extend}* \N{COMBINING CIRCUMFLEX ACCENT} ) \X /ix. How am I do that in Ruby? Neither Ruby nor Python can even come close to meeting the MINIMAL requirements of UTS#18 on Unicode Regexes. See now?

steenslag Over a year ago

I'm not a good discussion partner in this case- I had to wikipedia most of your key words. But what would you advise the OP, ruby or python ?

tchrist Over a year ago

@steenstag: Ruby or Python for what? Regular expressions? Both require what are to me unacceptable compromises. I have to be able to work with Unicode.

manojlds · Accepted Answer · 2011-04-15 03:21:59Z

5

I like the /pattern/ syntax in Ruby, inspired from Perl, for regular expressions. Python's re.compile("pattern") is not really elegant for me. The syntatic sugar in Ruby and the fact that regular expressions are a separate re module in Python, makes me lean towards Ruby when it comes to Regular Expressions.

Apart from this, I don't see much of a difference from a normal Regular Expression programming perspective. Both the languages have pretty comprehensive and mostly similar RE support. There might be performance differences ( Python traditionally has has better performance ) and also Python has greater unicode regular expressions support.

answered Apr 15, 2011 at 3:21

manojlds

303k66 gold badges482 silver badges426 bronze badges

2 Comments

tchrist Over a year ago

How many of the standard Unicode properties does Python support? Also, how is Python’s support for proper grapheme clusters coming along, like via \X or perhaps through \p{Grapheme_Base}\p{Grapheme_Extend}*? Does it do full 1:many Unicode case folding for case insensitive matches? Can you reliably use any possible Unicode code point, or are you still hamstrung by that BMP restriction (which Unicode forbids, ahem)? BTW, I’m just ribbing you, don’t take it too seriously.

tchrist Over a year ago

I strongly agree with you that having regexes tightly coupled to the core language instead of nailed on the side with a library makes a really big difference in usabilty.

dawg · Accepted Answer · 2011-04-16 06:31:32Z

3

If the question is only about regex's: neither. Use Perl.

You should choose between those languages based on the other non-regex issues that you are trying to solve and the community support in that language that is nearby your field of endeavor.

If you are truly only picking a language based on regex support -- choose Perl...

answered Apr 16, 2011 at 6:31

dawg

105k24 gold badges142 silver badges217 bronze badges

Comments

Milind S. Pandit · Accepted Answer · 2014-11-06 02:07:44Z

2

Ruby's Regexp#match method is equivalent to Python's re.search(), not re.match(). re.search() and Regexp#match look for the first match anywhere in a string. re.match() looks for a match only at the beginning of a string.

To perform the equivalent of re.match(), a Ruby regular expression will need to start with a ^, indicating matching the beginning of the string.

To perform the equivalent of Regexp#match, a Python regular expression will need to start with .*, indicating matching zero or more characters.

answered Nov 6, 2014 at 2:07

Milind S. Pandit

1232 silver badges5 bronze badges

Comments

Greg Hewgill · Accepted Answer · 2011-04-15 02:08:09Z

1

The regular expression libraries for Ruby and Python are developed by two completely independent teams. Even if they are identical now (and I wouldn't be certain they are), there's no guarantee that they won't diverge sometime in the future.

The safest position is to assume they're different now, and assume they will continue to be different in the future.

answered Apr 15, 2011 at 2:08

Greg Hewgill

1.0m192 gold badges1.2k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Ruby Regex vs Python Regex

5 Answers 5

5 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related