java utf-8 text file reading bug?

Question

well I have a simple text file where I have my textual data filled, which requires to be saved as utf-8, since I have some unicode symbols...

Well i just wrote a normal text file with notepad and saved as txt with utf-8

But i seem to be getting some kind of weird thing in front: enter image description here

It's some kind of weird dot which can't even normally be pasted anywhere else. I could maybe try removing the first symbol, but I don't think that's a real solution, besides I'm not sure if it will always come up...

This is the code part:

FileInputStream fstream = new FileInputStream(fileName);
        // Get the object of DataInputStream
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String values;

        //Read File Line By Line

        System.out.println("Generating queries from: " + fileName);
        String fields = br.readLine(); 
        System.out.println("The fields are: " + fields);

Anyone came accross this and knows a solution?

Thanks in advance.

sorry i should have marked it red, it's right at the line where: the fields are: XLanguage_code... That's the X from here — Arturas M
– Arturas M, Commented May 6, 2012 at 1:00
Are you sure it isn't just a screen artifact? Something that doesn't affect the code, but is just left there? — Brendan Lesniak
– Brendan Lesniak, Commented May 6, 2012 at 1:01
probably should add, that it's not only notepad, it happened to me earlier today when my notepad++ was saving a txt file with unicode... — Arturas M
– Arturas M, Commented May 6, 2012 at 1:13

Stephen C · Accepted Answer · 2012-05-06 01:10:48Z

3

It is probably a Unicode Byte Order Mark (BOM). Some text editors (on Windows) start a UTF-8 text file with a BOM to flag that it is Unicode.

If you need to deal with this in Java, test to see if the first Unicode codepoint you read from the file is 0xffef, and if it is then remove it.

answered May 6, 2012 at 1:10

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BillRobertson42 Over a year ago

I agree. Utf-8 is byte order independent, but Microsoft adds one any way as an indicator that the file is utf-8. en.wikipedia.org/wiki/Byte_order_mark#UTF-8

Matt Ball Over a year ago

It's definitely a BOM: stackoverflow.com/questions/10467241/… (0d65279 = 0xFEFF)

Collectives™ on Stack Overflow

java utf-8 text file reading bug?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related