1

In my db I have a field value looking like this:

ΜΑΚΑΡΙΟΥ Γ\'

I think it must be Greek chars inserted when I didn't have set UTF-8 for my db (I think I was using the default Latin 1).

Is there a way to get the actual characters?

Thank you

2
  • If this is UTF8 that's stored inside a latin1 column, you could use utf8_encode() to bring the original encoding back. Commented Jan 15, 2013 at 8:41
  • @Jack I don't remember anymore. I think my db was in latin1 and 99% the data inserted are greek characters. I cannot get any result in any of my tries to convert this back. Commented Jan 15, 2013 at 9:32

2 Answers 2

2

Not sure, Try this :

$str = "ΜΑΚΑΡΙΟΥ Γ\'";
$val = iconv(mb_detect_encoding($str), "UTF-8", $str);
echo $val;
Sign up to request clarification or add additional context in comments.

2 Comments

I get back exactly the same string
This fixed my problem... Since UTF-8 wasn't working I tried one by one all charsets until I hit the right one. And I did. I don't know how it happened but instead of UTF-8 the charset was "windows-1252". Thank you
0

Try saving the data into a text file and opening the text file in a hex editor (there are a bunch of good free ones). That could show you the underlying code values of the letters, which you could then match against published encodings.

For example, this page lists Unicode values for Polytonic Greek values (not sure you were using Polytonic, though): http://leb.net/reader/text/standards/unicode/old/MappingTables/NewTables/Polytonic_Greek.txt.

Looking at the text with a hex editor will help you to get code values to look up in lookup tables like this.

5 Comments

Tried but the HEX I got is: FFFECE005301CE001820CE006101CE00A100CE002221CE007801CE00A5002000CE001C205C002700 How do I convert this?
So those first two bytes FFFE, mean that your file is encoded in little endian UTF-16. See discussion here: en.wikipedia.org/wiki/Byte_order_mark. The next two bytes are CE00, which in little endian is just CE. When I go look at the standard unicode greek page: unicode.org/charts/PDF/U0370.pdf, CE doesn't seem to fit into the greek range). So perhaps this is reflecting a different original code page, such as Windows-1253?
BTW, 00CE is quite regular in this stream. Interesting. Seems like it's every other two-byte pair!
Looking 00CE up in the unicode charts by code here, unicode.org/charts, steers us to unicode.org/charts/PDF/U0080.pdf, in which 00CE realy is an I with a circumflex, which is showing up in your snippet of text. But what was the significance of CE in your original encoding. Did you originally encode, perhaps in Windows 1253, en.wikipedia.org/wiki/Windows-1253, in which CE is a captial Xi? Or something else? We'll have to figure that out.
Setting aside the significance of 00CE, what are the intervening byte pairs, the first three of which are: 0153,0281,0161, etc. Let's look those up at unicode.org/charts . . . I'm out of time for now, but maybe you'll find something interesting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.