Ruby string encoding in Ruby 1.8.7

Question

I am creating a Ruby string using the Ruby C API (from Objective C) and it happens to hold Finnish characters.

Once in Ruby I call a gem that does some manipulation which truncates the string but the encoded characters get truncated improperly - very much like in this question:

How to get a Ruby substring of a Unicode string?

An example string is H pääsee syvemmälle A elämään - the umlauts get changed into things like \30333 but when truncated this ends up as \303 which is a problem.

I don't want to hack the gem to get round this issue as I have tested with the same string opened directly in Ruby and it worked fine.

So I know that I'm passing in something incorrectly to Ruby.

Here is how I turn the NSString into a VALUE to be used in Ruby.

- (VALUE) toRubyValue {
    size_t data_length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
    size_t buffer_length = data_length + 1;
    char buf[buffer_length];
    [self getCString:buf maxLength:buffer_length encoding:NSUTF8StringEncoding];
    return rb_str_new(buf, data_length);
}

I'm on Ruby 1.8.7

What is the best way to address this problem - I'm happy to do it in either in Ruby or C (or Objective C) but I would rather not use any Ruby Gems that have native C extensions

I didn't find a solution to this which meant I had to hack the gem using the solution in the SO link in the question. I will leave this question open as perhaps someone will know the correct way of doing this... — petenelson
– petenelson, Commented Apr 10, 2013 at 13:11

kaspar · Accepted Answer · 2013-05-13 06:39:24Z

1

I don't think you're passing something incorrectly to Ruby. You are creating a UTF-8 encoded Ruby 1.8 string. Ruby 1.8 doesn't care about encodings though and treats strings as arrays of bytes. This means that any incorrect piece of Ruby code can produce the results you talk about. 'Hacking' the gem is really your only option.

And upgrading to 1.9 or even 2.0 your best way out.

answered May 13, 2013 at 6:39

kaspar

1143 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Ruby string encoding in Ruby 1.8.7

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related