Ruby: incompatible encoding regexp match

Question

I'm having the issue that Ruby is refusing to split a string that contains a certain Latin character (and presumably would have problems with others). I know there are many posts about this specific error but none of the answers have worked for me.

I've boiled down the problem to the following example. Here is the entirety of a script that produces the problem. The script itself is in UTF-8.

#!/usr/bin/ruby
str = 'é'
arr = str.split(/x/sm)

That character in the second line is the Latin small e with an acute. (Yes, I know that because the string doesn't contain an 'x' there isn't much splitting to do, this is just an example to produce the error.)

Here's the error message, word wrapped for your safety and comfort:

./dev.rb:3:in `split': incompatible encoding regexp match
(Windows-31J regexp with UTF-8 string) (Encoding::CompatibilityError)
    from ./dev.rb:3:in `<main>'

I've tried reencoding the string to no avail. Neither of the following lines helps:

str = str.force_encoding('iso-8859-1').encode('utf-8')

or

str = str.force_encoding(Encoding::UTF_8)

Here's the version of Ruby I'm using:

ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]

Any help is appreciated.

mdesantis · Accepted Answer · 2017-07-24 12:32:23Z

3

Just encode the regex in UTF-8:

str = 'é'
arr = str.split(/x/mu)
#=> ["é"]

Documentation: https://ruby-doc.org/core-2.3.1/Regexp.html#class-Regexp-label-Encoding

answered Jul 24, 2017 at 12:32

mdesantis

8,5574 gold badges35 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Ruby: incompatible encoding regexp match

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related