Android read file encoding issue

Question

I'm trying to read a file from the SD card and I've been told it's in unicode format. However, when I try to read the file I get the following:

Encoded file

This is the code I'm using to read the file:

InputStreamReader fw = new InputStreamReader(new FileInputStream(root.getAbsolutePath()+"/Drive/sdk/cmd.62.out"), "UTF-8");
char[] buf = new char[255];     
fw.read(buf);
String readString = new String(buf);
Log.d("courierread",readString);    
fw.close();

If I write that output to a file this is what I get when I open it in a hex editor: Hex info

Any thoughts on what I need to do to read the file correctly?

Community · Accepted Answer · 2017-05-23 12:07:01Z

2

~~Does the file have a byte-order mark? In that case look at Reading UTF-8 - BOM marker~~

EDIT (from comment): That looks like little-endian UTF-16 to me. Try the charset "UTF-16LE".

edited May 23, 2017 at 12:07

CommunityBot

11 silver badge

answered Mar 28, 2011 at 10:25

RoToRa

38.5k12 gold badges72 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

RichW Over a year ago

Not sure, but I tried applying the BOM removal code and it seemed to make it worse! I suppose the easiest solution is to strip out all those weird A characters - unfortunately I don't know the unicode char to do so..

RoToRa Over a year ago

Stripping out those characters wouldn't be solving the problem. Are you sure it's a UTF-8 file? Can you look at the file in a hex editor and post a screen shot or the hex codes of the first few bytes?

RichW Over a year ago

All I know is that it's unicode. I tried UTF-16 and it was completely unreadable, it was just made up of lots of dodgy characters. As requested I've outputted the hex codes for each character (see the original post). It appears that there is a 0 in between every character..

RoToRa Over a year ago

A single 0 doesn't make much sense between the characters. It there really were a 0 byte it would be 00. The problem with your output, is that it has already been processed by (possibly wrong) Java code, so a look at it in an "independent" hex editor would be better...

RoToRa Over a year ago

Thanks. That looks like little-endian UTF-16 to me. Try the charset "UTF-16LE".

|

Joachim Sauer · Accepted Answer · 2011-03-28 12:28:49Z

1

The file you show in the hex editor is not UTF-8 encoded, it looks more like UTF-16. This means you must specify UTF-16 as the encoding in your code (probably the UTF-16LE variant).

If it were UTF-8 encoded, then it would represent all characters representable in ASCII using just a single byte.

answered Mar 28, 2011 at 12:28

Joachim Sauer

309k59 gold badges568 silver badges624 bronze badges

1 Comment

RichW Over a year ago

Interesting tip, thanks for that. I'll try creating different files with different types of encoding.. I guess that is the easiest way to learn the difference

Collectives™ on Stack Overflow

Android read file encoding issue

2 Answers 2

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related