1

I'm trying to decode a zipped json file that is dropped (drag and drop file) into my Flutter web-app, however am having some challenges getting the correct text from the file.

For example, the file contains the following line: "Cond\u00c3\u00a9 Nast"

This should be Condé Nast. I am somewhat aware of character encoding, but this has me stumped.

This is currently how I'm unzipping, utf8 decoding and finally json decoding the file. I am using the Archive package to do the unzipping.

ArchiveFile theFile = otherFiles.first;
final fileString = Utf8Decoder().convert(theFile.content);
Iterable l = json.decode(fileString);

How would I go about printing the correct character from this input JSON file string? Is this an issue of incorrect encoding? Or Is it an issue with my implementation?

4
  • 1
    If you mean that the object you get back from json.decode is a String with "Cond\u00c3\u00a9 Nast", then I think it's likely that the JSON file was encoded incorrectly when it was written. Can you check the bytes of theFile.content? The UTF-8 code units for é is the sequence 0xC3, 0xA9, so it seems like your string somehow was double-encoded to UTF-8. Commented Apr 28, 2022 at 21:21
  • @jamesdlin I think you're correct on the incorrect encoding - even the json file itself contains "Cond\u00c3\u00a9 Nast" as a value. Are you suggesting that I try to get the bytes from theFile.content and then try to UTF-8 decode it? I believe that theFile.content are the bytes. Commented Apr 29, 2022 at 7:33
  • 1
    I meant that you should check if the incorrect bytes are already present in theFile.content. The correct sequence of UTF-8 bytes would be 0xC3, 0xA9. An incorrect sequence from double-encoding to UTF-8 would be 0xC3, 0x83, 0xC2, 0xA9. Commented Apr 29, 2022 at 8:21
  • I'm unsure how to go about converting the content to a list of UTF-8 bytes, as when I try to encode it I just get a list of integers. However, from what I've researched it does seem that the data was encoded incorrectly, as the file itself contains double-encoding sequences. Commented Apr 29, 2022 at 11:03

1 Answer 1

2

As discussed in comments, I suspect that theFile.content was incorrectly encoded at some earlier step and that it was sent through a UTF-8 encoder twice. You can verify that by opening the file in a hex editor and examining its bytes. Alternatively, in Dart you can do print(theFile.content); or, if you prefer seeing byte values in hexadecimal, print([for (var byte in theFile.content) byte.toRadixString(16)]);.

If you don't have any control over whatever generated that file, you should complain to someone who does. In the meantime, you can try to undo the damage by feeding your content through a UTF-8 decoder twice. utf.decode/Utf8Decoder().convert expect a List<int>, so you can't just feed them the mis-encoded String directly.

void main() {
  var badString = 'Cond\u00c3\u00a9 Nast';
  var goodString = utf8.decode(badString.runes.toList());
  print(goodString); // Prints: Condé Nast
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.