2

I have a file which contains letter ø. When I read from it with this code File.ReadLines(filePath) I got a question mark instead of it.

And when I add Encoding like this File.ReadLines(filePath, Encoding.GetEncoding(1252)) I get the ø character.

But default Encoding is already set to 1252, property Encoding.Default.CodePage returns 1252.

So why do I have to specify Encoding to 1252 while reading, when default one is already set to 1252?

And one more question, what if file is Unicode, will C# recognize its format or I have to specify Unicode encoding?

5
  • 1
    File.ReadLines defaults to utf8 for encoding Commented Mar 2, 2016 at 14:02
  • 1
    Using a legacy 8-bit codepage encoding, like 1252, is a practice from the previous century. You simply need to stop doing that, there is no remaining reason today to not use utf-8. As you found out, File.ReadLines() defaults to Encoding.UTF8. Delete the file or re-save it with a text editor. Notepad is already good enough, Encoding combobox on the SaveAs dialog. Commented Mar 2, 2016 at 14:09
  • Is UTF-8 default also when writing to file? Commented Mar 2, 2016 at 14:09
  • @HansPassant But program receives files in ANSI, I cannot change those files manually to UTF-8. Commented Mar 2, 2016 at 14:12
  • Well, of course you can. After doing that about ten times, you'd consider solving the real problem. And pick up a telephone instead of asking people that have no way to actually do anything about it. Commented Mar 2, 2016 at 14:23

1 Answer 1

7

The reason is that by default the encoding used when reading text files is UTF8.

Encoding.Default is not (despite its name) the default encoding used when reading files!

A much better name for Encoding.Default would have been Encoding.UsingCurrentCodePage, in my opinion. ;)

Also note that rather than using File.ReadLines(filePath, Encoding.GetEncoding(1252)) you could use File.ReadLines(filePath, Encoding.Default).

You would do that if your code is trying to read files that have been created in a different code page than 1252, and that code page is the current code page for the system on which the code is running.

The only reason you should be using code pages is if you are reading or writing legacy files.

Sign up to request clarification or add additional context in comments.

4 Comments

Is UTF-8 also default when writing to file?
@Aleksa Yep. It's the standard for files nowadays.
after loads of testing , I found following code works better rather than "Encoding.Default" var csvContent = System.IO.File.ReadAllText(import.File.LocalPath, Encoding.GetEncoding("Windows-1252"));
Amazing. Tried every value except 'Default' because why would I try that...!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.