on my site I allow for direct text file uploads. These files are then stored on the server, and displayed on the website. I use UTF-8 on the site.
Now I run into trouble when people upload non-UTF-8 files which contain special chars, such as é.
I've been doing some testing. Made 2 text files, both containing the same word fiancée. One encoded UTF-8 and one encoded ISO 8859-2.
The UTF-8 one uploads fine, and shows the text correct, but the ISO 8859-2 shows as fianc�e.
Now I've tried to detect the uploaded file content with mb_detect_encoding, but whatever file I throw at it, it always detect UTF-8.
I noticed that I can use utf8_encode to convert the ISO 8859-2 files to valid UTF-8, but this only works on non-UTF files. And as I currently cannot detect non-UTF files, I cannot use the utf8_encode function, as it messes up valid UTF-8 files.
Hope this makes sense :)
So my question is, how can I detect files that are for sure not UTF-8 encoded to start with, so that I can use the utf8_encode function on them.