PHP function substr() error

Question

When I use substr() I get a strange character at the end

$articleText = substr($articleText,0,500);

I have an output of 500 chars and � <--

How can I fix this? Is it an encoding problem? My language is Greek.

Have seen the same thing in (UK) English.

alimack
– alimack

2014-08-25 11:03:32 +00:00
Commented Aug 25, 2014 at 11:03 — alimack
– alimack, Commented Aug 25, 2014 at 11:03

Pascal MARTIN · Accepted Answer · 2009-12-29 09:08:48Z

61

substr is counting using bytes, and not characters.

greek probably means you are using some multi-byte encoding, like UTF-8 -- and counting per bytes is not quite good for those.

Maybe using mb_substr could help, here : the mb_* functions have been created specifically for multi-byte encodings.

answered Dec 29, 2009 at 9:08

Pascal MARTIN

402k82 gold badges665 silver badges666 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Boris Delormas Over a year ago

Learning more and more every single day... Thank you stackoverflow !

ivkremer Over a year ago

Thank you very much. But as for me the main thing is to add mb_internal_encoding("UTF-8"); before using mb_* functions. Without adding it I still see squares.

trejder Over a year ago

@Kremchik You won't see squares, if you use mb_substr($short, 0, 75, 'utf-8'). Then you don't need to use mb_internal_encoding before mb_substr.

hakre · Accepted Answer · 2012-01-29 14:11:16Z

20

Use mb_substr instead, it is able to deal with multiple encodings, not only single-byte strings as substr:

$articleText = mb_substr($articleText,0,500,'UTF-8');

edited Jan 29, 2012 at 14:11

hakre

200k55 gold badges454 silver badges865 bronze badges

answered Jan 29, 2012 at 13:30

Uğur Özpınar

1,0438 silver badges16 bronze badges

3 Comments

user993683 Over a year ago

"UTF-8" part was important for me - don't forget it peeps!

Kent Munthe Caspersen Over a year ago

"UTF-8" as optional parameter worked for me. Keep in mind that you might also want to use mb_strlen() if you are using the string length to determine if it must be cut.

trejder Over a year ago

An alternative is to use mb_internal_encoding('utf-8') before any mb_* command.

deceze · Accepted Answer · 2009-12-29 09:10:06Z

6

Looks like you're slicing a unicode character in half there. Use mb_substr instead for unicode-safe string slicing.

answered Dec 29, 2009 at 9:10

deceze♦

525k89 gold badges806 silver badges954 bronze badges

1 Comment

trejder Over a year ago

...with calling mb_internal_encoding('utf-8') before or with using 'utf-8' as fourth parameters of mb_substr. Doc says, that it is optional and when it is omitted, the internal character encoding value will be used, but the think is (explained somewhere else in PHP doc), that PHP's "internal encoding" in nearly always "something else" than your page encoding. So for slicing UTF8 string, this fourth parameter or calling mb_internal_encoding('utf-8') becomes required.

Kerem · Accepted Answer · 2015-02-14 02:53:53Z

1

use this function, It worked for me

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

Credits: http://php.net/manual/en/function.mb-substr.php#107698

edited Feb 14, 2015 at 2:53

Kerem

11.6k5 gold badges61 silver badges58 bronze badges

answered May 7, 2013 at 21:19

Moussawi7

13.4k7 gold badges39 silver badges50 bronze badges

Comments

Dr Nick Engerer · Accepted Answer · 2012-08-18 00:59:30Z

0

ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:

 trim()

or:

 var_dump(preg_match('/^\n|\n$/', $variable));

or:

str_replace (array('\r\n', '\n', '\r'), ' ', $text)

Don't catch.

answered Aug 18, 2012 at 0:59

Dr Nick Engerer

7857 silver badges11 bronze badges

Comments

Kristoffer Bohmann · Accepted Answer · 2013-03-30 17:15:43Z

0

Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.

$articleText = substr(utf8_decode($articleText),0,500);

To get the articleText string back to UTF-8, an extra operation will be needed:

$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );

answered Mar 30, 2013 at 17:15

Kristoffer Bohmann

4,1243 gold badges31 silver badges35 bronze badges

1 Comment

gre_gor Over a year ago

This doesn't work at all.

GowriShankar · Accepted Answer · 2014-10-27 12:52:24Z

0

You are trying to cut unicode character.So i preferred instead of substr() try mb_substr() in php.

substr()

substr ( string $string , int $start [, int $length ] )

mb_substr()

mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )

For more information for substr() - Credits => Check Here

answered Oct 27, 2014 at 12:52

GowriShankar

1,65418 silver badges31 bronze badges

Collectives™ on Stack Overflow

PHP function substr() error

7 Answers 7

3 Comments

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

3 Comments

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related