I am creating a Ruby string using the Ruby C API (from Objective C) and it happens to hold Finnish characters.
Once in Ruby I call a gem that does some manipulation which truncates the string but the encoded characters get truncated improperly - very much like in this question:
How to get a Ruby substring of a Unicode string?
An example string is H pääsee syvemmälle A elämään - the umlauts get changed into things like \30333 but when truncated this ends up as \303 which is a problem.
I don't want to hack the gem to get round this issue as I have tested with the same string opened directly in Ruby and it worked fine.
So I know that I'm passing in something incorrectly to Ruby.
Here is how I turn the NSString into a VALUE to be used in Ruby.
- (VALUE) toRubyValue {
size_t data_length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
size_t buffer_length = data_length + 1;
char buf[buffer_length];
[self getCString:buf maxLength:buffer_length encoding:NSUTF8StringEncoding];
return rb_str_new(buf, data_length);
}
I'm on Ruby 1.8.7
What is the best way to address this problem - I'm happy to do it in either in Ruby or C (or Objective C) but I would rather not use any Ruby Gems that have native C extensions