0

With cURL, I am saving files which have UTF-8 chars in the filename.

For example:

testšć.docx

When I used mb_detect_encoding(), it returned ASCII.

So I tried to convert it with iconv from ASCII to UTF-8 and from UTF-8 to UTF-8.

Neither option worked.

So does anyone has a suggestion on how I could keep UTF-8 file names?

Cheers!

5
  • make sure the php file is encoded in UTF8 if you are doing an output of the data and try using utf8_encode to convert to the proper encoding (php.net/manual/en/function.utf8-encode.php) Commented Dec 17, 2011 at 2:33
  • Show the piece of code that calls mb_detect_encoding. What are you calling mb_detect_encoding on exactly? Also note that mb_detect_encoding is not really an exact science and you hardly need it if you know what encoding you're dealing with. Commented Dec 17, 2011 at 2:45
  • @Dany Khalife i fetch file name which has UTF-8 chars from website. like that: $h1 = $html->find('h1',0)->plaintext; then I just use file_put_contents but it doesn't save properly. deceze I just used it once to saw what encoding is parsed text encoded in. Commented Dec 17, 2011 at 13:07
  • ok i see what you mean, any chance we can have a look at the piece of code that fetches your data and saves them? (That way we can try it ourselves..) Commented Dec 17, 2011 at 14:45
  • @DanyKhalife sure $html = str_get_html($data); $h1 = $html->find('h1',0)->plaintext; $dir = 'C:\xampp\htdocs\project\\'.$h1; mkdir($dir); // its almost identical as i posted above.When I try to create directory it has strange chars in there.So any suggestions? Btw for own testing you can try to fetch char from utf8-chartable.de and create folder with it Commented Dec 17, 2011 at 15:48

2 Answers 2

1

Your file system (and operating system) must support UTF-8 encoded file names in order to retain files that use UTF-8 in the file name. If either do not support that, then the best option is to either convert them into a known transliteration, or discard (replace) the characters that cannot be converted.

Sign up to request clarification or add additional context in comments.

4 Comments

im curious to know which FS or OS doesn't support UTF8
Here is a list on Wikipedia of filesystem limitations.
In short, not many. I could be entirely misunderstanding your question though.
@JoshuaK My filesystem does support it, because if I type with my keyboard it works ok.But problem is once I parse UTF-8 filename from website like that: $h1 = $html->find('h1',0)->plaintext;
0

In analogy with MySQL, when your MySQL data is encoded in UTF8 you should have php read it through a UTF8 "communication" so since your HTML data is in UTF8 i think your problem is (tho i don't have all your code to know if im correct) that you are not reading it as UTF8

Try adding this option to your cURL config:

curl_setopt( $ch, CURLOPT_ENCODING, "UTF-8" );  

I don't know if this IS what you are missing, but in case not let me know and i'll update my answer...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.