10

We know that String.utf16 provides the codeunits or String.unicodeScalars provides the scalars.

If we manipulate the codeunits and unicodeScales by removing some elements etc. is there a way to construct back the resulting string?

3 Answers 3

9

Update for Swift 2.1:

You can create a String from an array of UTF-16 characters with the

public init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)

initializer. Example:

let str = "H€llo 😄"

// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]

// UTF16 array to string:
let str2 = String(utf16CodeUnits: utf16array, count: utf16array.count)
print(str2)
// H€llo 😄

Previous answer:

There is nothing "built-in" (as far as I know), but you can use the UTF16 struct which provides a decode() method:

extension String {

    init?(utf16chars:[UInt16]) {
        var str = ""
        var generator = utf16chars.generate()
        var utf16 : UTF16 = UTF16()
        var done = false
        while !done {
            let r = utf16.decode(&generator)
            switch (r) {
            case .EmptyInput:
                done = true
            case let .Result(val):
                str.append(Character(val))
            case .Error:
                return nil
            }
        }
        self = str
    }
}

Example:

let str = "H€llo 😄"

// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]

// UTF16 array to string:
if let str2 = String(utf16chars: utf16array) {
    print(str2)
    // Output: H€llo 😄
}

Slightly more generic, you could define a method that creates a string from an array (or any sequence) of code points, using a given codec:

extension String {
    init?<S : SequenceType, C : UnicodeCodecType where S.Generator.Element == C.CodeUnit>
        (codeUnits : S, var codec : C) {
        var str = ""
        var generator = codeUnits.generate()
        var done = false
        while !done {
            let r = codec.decode(&generator)
            switch (r) {
            case .EmptyInput:
                done = true
            case let .Result(val):
                str.append(Character(val))
            case .Error:
                return nil
            }
        }
        self = str
    }
}

Then the conversion from UTF16 is done as

if let str2a = String(codeUnits: utf16array, codec: UTF16()) {
    print(str2a)
}

Here is another possible solution. While the previous methods are "pure Swift", this one uses the Foundation framework and the automatic bridging between NSString and Swift String:

extension String {

    init?(utf16chars:[UInt16]) {
        let data = NSData(bytes: utf16chars, length: utf16chars.count * sizeof(UInt16))
        if let ns = NSString(data: data, encoding: NSUTF16LittleEndianStringEncoding) {
            self = ns as String
        } else {
            return nil
        }
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

The while !done part is one of the few times I’ve found labelled breaks useful in Swift i.e. end: while true … case .EmptyInput: break end
1

The answer is as simple as:

/// An array of the UTF-16 for "Hello, world!".
let a: [UTF16.CodeUnit] = Array("Hello, world!".utf16)

/// A string representation of a, interpreted as UTF-16
let s = String(decoding: a, as: UTF16.self) // <=== The API you want
print(s)

3 Comments

or simply String(utf16CodeUnits: a, count: a.count)
That one requires Foundation, uses an unsafe API, and doesn't generalize to other collections and encodings
Well the question is Is there a way to create a String from utf16 array in swift?. Import Foundation shouldn't be a problem in most situations
0

Here it is.

extension String {
    static func fromUTF16Chars(utf16s:UInt16[]) -> String {
        var str = ""
        for var i = 0; i < utf16s.count; i++ {
            let hi = Int(utf16s[i])
            switch hi {
            case 0xD800...0xDBFF:
                let lo = Int(utf16s[++i])
                let us = 0x10000
                    + (hi - 0xD800)*0x400 + (lo - 0xDC00)
                str += Character(UnicodeScalar(us))
            default:
                str += Character(UnicodeScalar(hi))
            }
        }
        return str
    }
}

let str = "aαあ🐣aαあ🐣"
var utf16cs = UInt16[]()
for utf16c in str.utf16 {
    utf16cs += utf16c
}
let str2 = String.fromUTF16Chars(utf16cs)
assert(str2 == str)
println(str2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.