2

So, I'm bored, and I've found what appears to be a strange inconsistency I was hoping to find more information on. This deals with string concatenation in Ruby, particularly appending what the string documentation refers to as a "codepoint".

Here're some examples:

'' << 233 #=> "é"
'' << 256 #=> "Ā"

Now, the curious thing is that in IRB both of those examples work. However, if you create a ruby class in a file, load the file, and execute the code, it blows up. See following example:

class MyConcatenationTest
  def self.test
    '' << 233
    '' << 256
  end
end

And then in IRB:

load 'my_concatenation_test.rb'  #=> true
MyConcatenationTest.test         #=> RangeError: 256 out of char range

So, my question is this: Why does this work in IRB, but not when I load a script that runs the same line of code?

Some other things to notice, if you alter the class:

class MyConcatenationTest
  def self.test
    '' << 233
    #'' << 256
  end
end

... and then reload/run the method, it returns the \x escaped value for 233 instead of the "é" from before:

load 'my_concatenation_test.rb'
MyConcatenationTest.test          #=> "\xE9"

So... what's up with that? Both strings have the same encoding (UTF-8), and changing it to ASCII doesn't seem to make any difference.

EDIT: I should mention that I used 256 in the example above because that's the lowest number it blows up on. It's pretty obvious that it's freaking out because it can't properly deal with anything higher than "\xFF". To clarify my question, I'm curious to know why this limitation exists when the code exists in a loaded ruby file, but not in IRB.

7
  • 1
    No it is good for me.. I didn't get any error. In which Ruby version you are? Commented Sep 12, 2013 at 20:07
  • @ArupRakshit It fails in Ruby 1.9.3-p429, but works in 2.0.0 for me. Commented Sep 12, 2013 at 20:09
  • 1
    Then, the question is why it works in irb. Commented Sep 12, 2013 at 20:11
  • Maybe IRB runs a different Ruby version than your system. Add puts RUBY_VERSION to check that. Commented Sep 12, 2013 at 20:12
  • @mbratch Yes I tested on Ruby2.0.0-p0, but forgot to mention that.. sorry. :) Commented Sep 12, 2013 at 20:13

1 Answer 1

3

Which ruby version do you use? It's is probably because in ruby 1.9 (and earlier) UTF-8 is not the default encoding.

Modifying your file to the following advises ruby to use UTF-8 to parse your file.

# ~coding: utf-8
class MyConcatenationTest
  def self.test
    '' << 233
    '' << 256
  end
end

If you execute the file in ruby 2.0, it works as expected without the magic comment, because UTF-8 is the default encoding in ruby 2.0.

Why does it work in irb (even with ruby 1.9.3)?

irb uses the $LANG environment variable to determine which encoding it should use. My (and maybe your?) $LANG is set to en_US.UTF-8, which makes irb use UTF-8 encoding.

You may start your irb with irb -EISO-8859-1 (or some other encoding) to change that.

$ irb -EISO-8859-1 # start irb with ISO-8859-1 encoding
irb(main):001:0> "".encoding
=> #<Encoding:ISO-8859-1>
Sign up to request clarification or add additional context in comments.

7 Comments

It fails on Ruby 1.9.3-p429
@mbratch with or without the magic comment? It works for me with the comment in 1.9.3-p392 (oh, I should update my dusty 1.9.3 :)
I have seen #encoding 'utf-8' but not the magic one # ~coding: utf-8. Documentation please for myself..:P
How very interesting. I didn't think about the encoding used for the code-file at all, and only checked that the string encoding was UTF-8. Thanks much for the fast answer.
@ArupRakshit you find the mri implementation of the magic comment here: github.com/ruby/ruby/blob/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.