5

Can I convert directly between a Swift Character and its Unicode numeric value? That is:

var i:Int = ...  // A plain integer index.
var myCodeUnit:UInt16 = myString.utf16[i]
// Would like to say myChar = myCodeUnit as Character, or equivalent.

or...

var j:String.Index = ... // NOT an integer!
var myChar:Character = myString[j]
// Would like to say myCodeUnit = myChar as UInt16

I can say:

myCodeUnit = String(myChar).utf16[0]

but this means creating a new String for each character. And I am doing this thousands of times (parsing text) so that is a lot of new Strings that are immediately being discarded.

3
  • 2
    Be aware that unicode is not a 16-bit charset, but a 21bit charset. Commented Jun 8, 2014 at 1:32
  • 1
    Can you do this operation in bulk? Say read a String of 1024 characters, then for in loop through each character instead of allocating a String per character? Commented Jun 8, 2014 at 3:04
  • Yes I am aware of the 21-bit issue. It is even more complicate than that, alas. Thanks SiLo, that is what I am doing. In fact, I keep an integer index and a String.Index going in parallel. But it seems a bit roundabout. Commented Jun 9, 2014 at 5:29

3 Answers 3

4

The type Character represents a "Unicode grapheme cluster", which can be multiple Unicode codepoints. If you want one Unicode codepoint, you should use the type UnicodeScalar instead.

Sign up to request clarification or add additional context in comments.

Comments

4

As per the swift book:

String to Code Unit

To get codeunit/ordinals for each character of the String, you can do the following:

var yourSwiftString = "甲乙丙丁"
for scalar in yourSwiftString.unicodeScalars {
    print("\(scalar.value) ")
}

Code Unit to String

Because swift current does not have a way to convert ordinals/code units back to UTF, the best way I found is to still NSString. i.e. if you have int ordinals (32bit but representing the 21bit codepoints) you can use the following to convert to Unicode:

var i = 22247
var unicode_str = NSString(bytes: &i, length: 4, encoding: NSUTF32LittleEndianStringEncoding)

Obviously if you want to convert a array of ints, you'll need to pack them into a array first.

Comments

1

I spoke to an Apple engineer who is working on Unicode and he says they have not completed the implementation of unicode characters in strings. Are you looking at getting a code unit or a full character? Because the only and proper way to get at a full unicode character is by using a for each loop on a string. ie

for c in "hello" {
    // c is a unicode character of type Character
}

But, this is not implemented as of yet.

1 Comment

Thank you Kyle, it looks like the for-in iteration is working. I can't use that as-is because I need to break out of the loop and then back in, resuming where I left off, hundreds of times (tokenizing). BUT.. I see there is support for generators! I haven't figured out how to use them yet (perhaps not completely implemented) but that will be the way I go when I go. I do want to do this completely in Swift, just for consistency's sake. Otherwise I would get an NSData object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.