73

I am doing a real estate feed for a portal and it is telling me the max length of a string should be 20,000 bytes (20kb), but I have never run across this before.

How can I measure byte size of a varchar string. So I can then do a while loop to trim it down.

2
  • there shouldn't be any problem getting a string to that length is there what is it telling you ? what errors are you seeing ???? Commented Sep 27, 2011 at 12:30
  • byte size -> strlen() ex: strlen('a₹') -> 4. character count -> mb_strlen() ex: mb_strlen('a₹', "UTF-8") -> 2. Note: mb_strlen() is disabled by default in php. Commented Jun 24, 2021 at 20:58

5 Answers 5

104

You can use mb_strlen() to get the byte length using a encoding that only have byte-characters, without worring about multibyte or singlebyte strings. For example, as drake127 saids in a comment of mb_strlen, you can use '8bit' encoding:

<?php
    $string = 'Cién cañones por banda';
    echo mb_strlen($string, '8bit');
?>

You can have problems using strlen function since php have an option to overload strlen to actually call mb_strlen. See more info about it in http://php.net/manual/en/mbstring.overload.php

For trim the string by byte length without split in middle of a multibyte character you can use:

mb_strcut(string $str, int $start [, int $length [, string $encoding ]] )

Update for PHP 8.0 or great: Since PHP 8.0, Function Overloading was removed so you can always use strlen() to check the lenght in bytes.

Sign up to request clarification or add additional context in comments.

2 Comments

The mbstring function overloading has been removed in PHP 8.0, so it's safe to just use strlen now.
@matronator, thanks for the update. I have added the info in the answer.
31

You have to figure out if the string is ascii encoded or encoded with a multi-byte format.

In the former case, you can just use strlen.

In the latter case you need to find the number of bytes per character.

the strlen documentation gives an example of how to do it : http://www.php.net/manual/en/function.strlen.php#72274

4 Comments

strlen is not mb-safe function and actually returns number of bytes, not of characters. If you want number of characters in multi-byte encoding, you have to use mb_strlen.
@Darhazer it is possible to overload str*() into mb_str*(), so calling strlen will indeed call mb_strlen. To see if this is enabled, check mbstring.func_overload in php.ini. Also see php.net/manual/en/mbstring.overload.php
If you're looking for the number of bytes (which is what you asked for - not the number of characters) the correct answer was posted by @PhoneixS below; as pointed out by @Carlos strlen() isn't safe because it may be overloaded on some PHP installations.
@CarlosCampderrós Function overloading deprecated in PHP 7.2.0, removed in PHP 8.0.0. php.net/manual/en/mbstring.overload.php
28

Do you mean byte size or string length?

Byte size is measured with strlen(), whereas string length is queried using mb_strlen(). You can use substr() to trim a string to X bytes (note that this will break the string if it has a multi-byte encoding - as pointed out by Darhazer in the comments) and mb_substr() to trim it to X characters in the encoding of the string.

8 Comments

strlen doesn't give you byte size.
@N.B.it gives you exactly the number of bytes... that's why there is mb_strlen() in the mb_ extension. Try strlen on multi-byte character to test...
@soulmerge as Carlos Campderrós said in other answer, it is possible to overload str*() into mb_str*(), so calling strlen will indeed call mb_strlen. To see if this is enabled, check mbstring.func_overload in php.ini. Also see php.net/manual/en/mbstring.overload.php
There is now a note on the PHP manual page for strlen(): "strlen() returns the number of bytes rather than the number of characters in a string." Not sure if that was there before, but it confirms that this answer is correct.
@PhoneixS Luckily, the function overloading "feature" has been removed as of PHP 8.0.0. Deprecated in 7.2.0. So you can now rely on strlen to return byte length of a string.
|
5

PHP's strlen() function returns the number of ASCII characters.

strlen('borsc') -> 5 (bytes)

strlen('boršč') -> 7 (bytes)

$limit_in_kBytes = 20000;

$pointer = 0;
while(strlen($your_string) > (($pointer + 1) * $limit_in_kBytes)){
    $str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
    // here you can handle (0 - n) parts of string
    $pointer++;
}

$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
// here you can handle last part of string

.. or you can use a function like this:

function parseStrToArr($string, $limit_in_kBytes){
    $ret = array();

    $pointer = 0;
    while(strlen($string) > (($pointer + 1) * $limit_in_kBytes)){
        $ret[] = substr($string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
        $pointer++;
    }

    $ret[] = substr($string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);

    return $ret;
}

$arr = parseStrToArr($your_string, $limit_in_kBytes = 20000);

Comments

4

Further to PhoneixS answer to get the correct length of string in bytes - Since mb_strlen() is slower than strlen(), for the best performance one can check "mbstring.func_overload" ini setting so that mb_strlen() is used only when it is really required:

$content_length = ini_get('mbstring.func_overload') ? mb_strlen($content , '8bit') : strlen($content);

1 Comment

Thankfully, this check is no longer needed as of PHP 8.0.0. The function overloading "feature" has been removed as of PHP 8.0.0, and deprecated in 7.2.0. So you can now rely on strlen to return byte length of a string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.