0

I am parsing a huge xml file and encoding of file is to be said
< ? xml version="1.0" encoding="ISO-8859-1" ?>**bold

The db encoding is utf8 and I am running this query before anything is saved to db
$sql='SET NAMES "utf8" COLLATE "utf8_swedish_ci"';

What the problem is that sometimes some non standard characters comes in the xml file like
Lycka™ : roman
I know that trademark symbol is from windows-1252 encoding.

Im using php. I have tried utf8_encode.

here is saved in db alt text and

here is the output in browser alt text

I want it to converted to utf, that's it

2 Answers 2

0

Did you try encoding the string in utf8 before saving to db ? For php there is utf8_encode() function, there might be similar functions in the language you are using.

Sign up to request clarification or add additional context in comments.

2 Comments

Im using php. Ya, I have tried utf8_encode. Here what is saved in db "Lycka : roman" and when I try to decode it, it is shown as "Lycka : roman "
I think you will have to use multibyte functions to encode. use mb_convert_encoding() php.net/manual/en/function.mb-convert-encoding.php
0

I used this code and worked fine

function cp1252_to_utf8($str) 
{

        $cp1252_map = array(
                "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
                "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
                "\xc2\x83" => "\xc6\x92",     /* LATIN SMALL LETTER F WITH HOOK */
                "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
                "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
                "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
                "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
                "\xc2\x88" => "\xcb\x86",     /* MODIFIER LETTER CIRCUMFLEX ACCENT */
                "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
                "\xc2\x8a" => "\xc5\xa0",     /* LATIN CAPITAL LETTER S WITH CARON */
                "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
                "\xc2\x8c" => "\xc5\x92",     /* LATIN CAPITAL LIGATURE OE */
                "\xc2\x8e" => "\xc5\xbd",     /* LATIN CAPITAL LETTER Z WITH CARON */
                "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
                "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
                "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
                "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
                "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
                "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
                "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */

                "\xc2\x98" => "\xcb\x9c",     /* SMALL TILDE */
                "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
                "\xc2\x9a" => "\xc5\xa1",     /* LATIN SMALL LETTER S WITH CARON */
                "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
                "\xc2\x9c" => "\xc5\x93",     /* LATIN SMALL LIGATURE OE */
                "\xc2\x9e" => "\xc5\xbe",     /* LATIN SMALL LETTER Z WITH CARON */
                "\xc2\x9f" => "\xc5\xb8"      /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
        );

        return  strtr(utf8_encode($str), $cp1252_map);
}


$str = cp1252_to_utf8( iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.