0

well I have a simple text file where I have my textual data filled, which requires to be saved as utf-8, since I have some unicode symbols...

Well i just wrote a normal text file with notepad and saved as txt with utf-8

But i seem to be getting some kind of weird thing in front: enter image description here

It's some kind of weird dot which can't even normally be pasted anywhere else. I could maybe try removing the first symbol, but I don't think that's a real solution, besides I'm not sure if it will always come up...

This is the code part:

FileInputStream fstream = new FileInputStream(fileName);
        // Get the object of DataInputStream
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String values;

        //Read File Line By Line

        System.out.println("Generating queries from: " + fileName);
        String fields = br.readLine(); 
        System.out.println("The fields are: " + fields); 

Anyone came accross this and knows a solution?

Thanks in advance.

5
  • sorry i should have marked it red, it's right at the line where: the fields are: XLanguage_code... That's the X from here Commented May 6, 2012 at 1:00
  • Are you sure it isn't just a screen artifact? Something that doesn't affect the code, but is just left there? Commented May 6, 2012 at 1:01
  • 1
    What is the value of fields.codePointAt(0)? Commented May 6, 2012 at 1:04
  • Yeah, i'm pretty sure it's not a screen artifact. Commented May 6, 2012 at 1:10
  • probably should add, that it's not only notepad, it happened to me earlier today when my notepad++ was saving a txt file with unicode... Commented May 6, 2012 at 1:13

1 Answer 1

3

It is probably a Unicode Byte Order Mark (BOM). Some text editors (on Windows) start a UTF-8 text file with a BOM to flag that it is Unicode.

If you need to deal with this in Java, test to see if the first Unicode codepoint you read from the file is 0xffef, and if it is then remove it.

Sign up to request clarification or add additional context in comments.

2 Comments

I agree. Utf-8 is byte order independent, but Microsoft adds one any way as an indicator that the file is utf-8. en.wikipedia.org/wiki/Byte_order_mark#UTF-8
It's definitely a BOM: stackoverflow.com/questions/10467241/… (0d65279 = 0xFEFF)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.