2

I'm having the issue that Ruby is refusing to split a string that contains a certain Latin character (and presumably would have problems with others). I know there are many posts about this specific error but none of the answers have worked for me.

I've boiled down the problem to the following example. Here is the entirety of a script that produces the problem. The script itself is in UTF-8.

#!/usr/bin/ruby
str = 'é'
arr = str.split(/x/sm)

That character in the second line is the Latin small e with an acute. (Yes, I know that because the string doesn't contain an 'x' there isn't much splitting to do, this is just an example to produce the error.)

Here's the error message, word wrapped for your safety and comfort:

./dev.rb:3:in `split': incompatible encoding regexp match
(Windows-31J regexp with UTF-8 string) (Encoding::CompatibilityError)
    from ./dev.rb:3:in `<main>'

I've tried reencoding the string to no avail. Neither of the following lines helps:

str = str.force_encoding('iso-8859-1').encode('utf-8')

or

str = str.force_encoding(Encoding::UTF_8)

Here's the version of Ruby I'm using:

ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]

Any help is appreciated.

1 Answer 1

3

Just encode the regex in UTF-8:

str = 'é'
arr = str.split(/x/mu)
#=> ["é"]

Documentation: https://ruby-doc.org/core-2.3.1/Regexp.html#class-Regexp-label-Encoding

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.