When I use substr() I get a strange character at the end
$articleText = substr($articleText,0,500);
I have an output of 500 chars and � <--
How can I fix this? Is it an encoding problem? My language is Greek.
When I use substr() I get a strange character at the end
$articleText = substr($articleText,0,500);
I have an output of 500 chars and � <--
How can I fix this? Is it an encoding problem? My language is Greek.
substr is counting using bytes, and not characters.
greek probably means you are using some multi-byte encoding, like UTF-8 -- and counting per bytes is not quite good for those.
Maybe using mb_substr could help, here : the mb_* functions have been created specifically for multi-byte encodings.
mb_internal_encoding("UTF-8"); before using mb_* functions. Without adding it I still see squares.mb_substr($short, 0, 75, 'utf-8'). Then you don't need to use mb_internal_encoding before mb_substr.Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:
$articleText = mb_substr($articleText,0,500,'UTF-8');
mb_internal_encoding('utf-8') before any mb_* command.Looks like you're slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.
mb_internal_encoding('utf-8') before or with using 'utf-8' as fourth parameters of mb_substr. Doc says, that it is optional and when it is omitted, the internal character encoding value will be used, but the think is (explained somewhere else in PHP doc), that PHP's "internal encoding" in nearly always "something else" than your page encoding. So for slicing UTF8 string, this fourth parameter or calling mb_internal_encoding('utf-8') becomes required.use this function, It worked for me
function substr_unicode($str, $s, $l = null) {
return join("", array_slice(
preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}
Credits: http://php.net/manual/en/function.mb-substr.php#107698
ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:
trim()
or:
var_dump(preg_match('/^\n|\n$/', $variable));
or:
str_replace (array('\r\n', '\n', '\r'), ' ', $text)
Don't catch.
Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.
$articleText = substr(utf8_decode($articleText),0,500);
To get the articleText string back to UTF-8, an extra operation will be needed:
$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );
You are trying to cut unicode character.So i preferred instead of substr() try mb_substr() in php.
substr()
substr ( string $string , int $start [, int $length ] )
mb_substr()
mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )
For more information for substr() - Credits => Check Here