0

I have a file test.HIO its content this:

 11/08/2015 00:05:50»ЦО Ворота выход»Дверь не открыта»24001695»Бахром Суннатуллоевич Тургунов»99»»»
 11/08/2015 00:05:54»ЦО Ворота выход»Верный доступ»24001215»Шохрух Джохонгирович Исламов»99»»»

If i use linux command file -i test.HIO i get this info:

test.HI0: text/plain; charset=iso-8859-1

If i convert this file use php function iconv or mb_convert_encoding:

$file_content = file( "test.HIO" );

// for example i get one line from file
$str = iconv( "ISO-8859-1", "UTF-8", $file_content[2] );
var_dump( $str );

$str2 = mb_convert_encoding( $file_content[2], "UTF-8", "ISO-8859-1" );
var_dump( $str2 );

I get the same result:

 string(159) " 11/08/2015 00:05:45»ÖÎ Âîðîòà âûõîä»Âåðíûé äîñòóï»24001695»Áàõðîì Ñóííàòóëëîåâè÷ Òóðãóíîâ»99»»» "

If i just show file content in browser:

echo '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />';
$file_content = file( "test.HI0" );

echo $file_content[2];

i see this:

11/08/2015 00:07:17��� 2 ����������� �������24001066��������� ���������� �������99���

How correctly show or save text in UTF-8 encode?
Thank in anvance.

UPD.

Thank to all. I find another solution it looks ugly, but working.

$file_content = file( "test.HIO" );

$str = iconv( "ISO-8859-1", "UTF-8", $file_content[2] );

// OR

$str = mb_convert_encoding( $file_content[2], "UTF-8", "ISO-8859-1" );

$str = iconv( 'utf-8', 'windows-1252', $str );
$str = iconv( 'windows-1251', 'utf-8', $str );

var_dump( $str );


UPD 2.

I chose the wrong way using file -i for detect file encoding.
As it turned out, my file encoding is windows-1251

chardet /home/file.HI0
/home/file.HI0: windows-1251 (confidence: 0.75)

or @yangsunny advice enca

enca -L ru /home/file.HI0
MS-Windows code page 1251

Eventually, can be used this code:

$file_content = file( "test.HIO" );

$str2 = mb_convert_encoding( $file_content[2], "UTF-8", "windows-1251" );
var_dump( $str2 );

Thank all for help.

9
  • Are you sure you have to convert this text into utf-8? I think it's already utf-8 text. Commented Mar 11, 2016 at 11:17
  • 1
    The encoding detection result you get ("iso-8859-1") is obviously wrong. There are no cyrillic characters in that char set. The issue here is that an automatic encoding detection simply is an impossible thing, especially for 8bit encodings. Neither is it reliable, nor does it deliver correct results. That simply is because it is impossible from a theoretical point of view. Commented Mar 11, 2016 at 11:17
  • Ok, why if i show text without encoding i see this 11/08/2015 00:07:17��� 2 ����������� �������24001066��������� ���������� �������99��� Commented Mar 11, 2016 at 11:20
  • IF the original data is really in ISO-8859-1, you can use utf8_encode to convert this to utf8. Commented Mar 11, 2016 at 11:34
  • you could try to use the php function mb_detect_encoding() with strict mode. Sadly, it doesnt always give the right result. Commented Mar 11, 2016 at 11:41

1 Answer 1

2

You are doing conversions the right way. The problem is that you don't know the source encoding. For example, think of currency conversion: you can convert £100 or ¥100 to US dollars. But you can't convert just "100".

From Wikipedia (emphasis mine):

ISO/IEC 8859-1:1998 [...] is generally intended for Western European languages (see below for a list).

It's clear that Cyrillic text (Russian, Ukrainian or whatever) cannot be ISO-8859-1, an encoding that only has characters from the Latin alphabet.

Correct text encoding detection is a manual task. If you know for sure the text is Cyrillic, you need to do some research to find out what encodings support Cyrillic and then figure out which one better matches your data. You might need to get actual hexadecimal values. Even then, there's still room for error. For instance, there might be encodings that are identical for 99% of characters but differ for the remaining 1%.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.