2

I'm loading a text file, the encoding is unknown as it comes from other sources. The content itself comes from macOS NSDocument's read method, which is fed into my model's read. The String constructor requires the encoding when using Data, if you assume the incorrect one you may get a null. I've created a conditional cascade of potential encodings (it's what other people seem to be doing), there's gotta be a better way to do this. Suggestions?

    override func read(from data: Data, ofType typeName: String) throws {
        model.read(from: data, ofType: typeName)
    }

In the model:

    func read(from data: Data, ofType typeName: String) {
        if let text = String(data: data, encoding: .utf8) {
            content = text
        } else if let text = String(data: data, encoding: .macOSRoman) {
            content = text
        } else if let text = String(data: data, encoding: .ascii) {
            content = text
        } else {
            content = "?????"
        }
    }
4
  • If your text is coming from the web you can check this post stackoverflow.com/a/34687962/2303865 Commented Jan 21, 2020 at 1:31
  • Thanks @LeoDabus for the suggestion, unfortunately it does not come from the web. It's a regular text file on the file system, hence NSDocument. Commented Jan 21, 2020 at 14:02
  • There is a static method on NSString to guess the encoding Commented Jan 21, 2020 at 14:03
  • @LeoDabus unless I'm missing something, NSString also requires the encoding to be specified. Do you have a link to the documentation on the specific factory or constructor method? Commented Jan 21, 2020 at 14:17

2 Answers 2

11

You can extend Data and create a stringEncoding property to try to detect the string encoding. Most of the time the data encoding is utf8 so first we can try to convert the string to utf8 and if that fails we can try to detect another encoding:

extension DataProtocol {
    var string: String? { .init(bytes: self, encoding: .utf8) }
}

extension Data {
    var stringEncoding: (
        string: String,
        encoding: String.Encoding
    )? {
        guard let string else {
            var nsString: NSString?
            let rawValue = NSString.stringEncoding(
                    for: self,
                    encodingOptions: nil,
                    convertedString: &nsString,
                    usedLossyConversion: nil
                )
            guard rawValue != 0, let string = nsString as? String
            else { return nil }
            return (
                string,
                .init(
                    rawValue: rawValue
                )
            )
        }
        return (string, .utf8)
    }
}

Then you can simply access the stringEncoding data property:

if let (string, encoding) = data.stringEncoding {
    print("string:", string, "encoding:", encoding.rawValue)
} else {
    print(nil)
}
Sign up to request clarification or add additional context in comments.

5 Comments

I chose this answer as the right one, because it is the right way to do it according to the documentation, but somehow keeps coming back with incorrect encodings. So, the issue is probably a bug within Apple's implementation.
Note that this is also decoding the data into a string at the same time (to the nsString variable). So of this method succeeds, you can just use that as the decoded string (can be toll-free bridged to String) instead of decoding it a second time elsewhere.
@stef yes that’s correct and a valid point. You can just change the return type to return a tuple instead. Note that the original post is about detecting the encoding and that’s what it does.
@LeoDabus Of course. I just wanted to mention the obvious, as the work is already being done by the method here to decode the string. And the cited reason for wanting to know the encoding in the original post was in order to decode the string in the first place.
@stef check the updated code.
0

Here is my approach based on @Leo Dabus's, work well on Xcode14.3

var stringEncoding: String.Encoding {
        var nsString: NSString?
        let rawValue = NSString.stringEncoding(for: self, encodingOptions: nil, convertedString: &nsString, usedLossyConversion: nil)
        return .init(rawValue: rawValue)
    }

1 Comment

From the docs "Return Value is an NSStringEncoding value, or 0 if a string encoding could not be determined." Why would you ignore zero and return an non optional value?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.