How can I sort an array of UTF-8 strings in PHP?

Question

need help with sorting words by utf-8. For example, we have 5 cities from Belgium.

$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
sort($array); // Expected: Aubel, Borgloon, Éghezée, Lennik, Thuin
              // Actual: Aubel, Borgloon, Lennik, Thuin, Éghezée

City Éghezée should be third. Is it possible to use/set some kind of utf-8 or create my own character order?

I just wanted to point out for future reference that natcasesort doesn't work out of the box: codepad.org/QgdF5DUY — middus
– middus, Commented Oct 28, 2011 at 13:25
Looks like there was similar question before: stackoverflow.com/questions/120334/… — user1012851
– user1012851, Commented Oct 28, 2011 at 13:33
Added a comment to reduce confusion as to what you're looking for versus what you get. — Billy ONeal
– Billy ONeal, Commented Oct 28, 2011 at 14:18

Thai · Accepted Answer · 2011-10-28 14:28:25Z

50

intl comes bundled with PHP from PHP 5.3 and it only supports UTF-8.

You can use a Collator in this case:

$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
$collator = new Collator('en_US');
$collator->sort($array);
print_r($array);

Output:

Array
(
    [0] => Aubel
    [1] => Borgloon
    [2] => Éghezée
    [3] => Lennik
    [4] => Thuin
)

answered Oct 28, 2011 at 14:28

Thai

11.5k2 gold badges49 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

RandomNoobName Over a year ago

How can I sort it in reverse order?

Renrhaf Over a year ago

just use array_reverse function on the result. php.net/manual/fr/function.array-reverse.php

Radek Pech Over a year ago

Sort by key: uksort($array, static fn($a, $b) => $collator->compare($a, $b));

mickmackusa Over a year ago

@Radek the callback for uksort() can be written as an array containing the class object then the method name. How can I sort an array by accented keys in PHP? ... if you need DESC directions, then $a and $b are useful.

Community · Accepted Answer · 2023-11-17 19:32:19Z

12

I think you can use strcoll:

setlocale(LC_COLLATE, 'nl_BE.utf8');
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
usort($array, 'strcoll'); 
print_r($array);

Result:

Array
(
    [0] => Aubel
    [1] => Borgloon
    [2] => Ã‰ghezÃ©e
    [3] => Lennik
    [4] => Thuin
)

You need the nl_BE.utf8 locale on your system:

fy@Heisenberg:~$ locale -a | grep nl_BE.utf8
nl_BE.utf8

If you are using debian you can use dpkg --reconfigure locales to add locales.

edited Nov 17, 2023 at 19:32

CommunityBot

11 silver badge

answered Oct 28, 2011 at 14:15

Fy-

3531 silver badge8 bronze badges

3 Comments

Fy- Over a year ago

Thai's solution for PHP 5.3 seems clean too

Enyby Over a year ago

strcoll don't work on Windows with utf-8, due CRT bogus implementation

user1768761 Over a year ago

Note that setlocale is not thread safe, so setting it back and forth might involve some risk of bad results.

sectus · Accepted Answer · 2016-07-20 08:43:41Z

8

This script should resolve in a custom way. I hope it help. Note the mb_strtolower function. You need to use it do make the function case insensitive. The reason why I didn't use the strtolower function is that it does not work well with special chars.

<?php

function customSort($a, $b) {
    static $charOrder = array('a', 'b', 'c', 'd', 'e', 'é',
                              'f', 'g', 'h', 'i', 'j',
                              'k', 'l', 'm', 'n', 'o',
                              'p', 'q', 'r', 's', 't',
                              'u', 'v', 'w', 'x', 'y', 'z');

    $a = mb_strtolower($a);
    $b = mb_strtolower($b);

    for($i=0;$i<mb_strlen($a) && $i<mb_strlen($b);$i++) {
        $chA = mb_substr($a, $i, 1);
        $chB = mb_substr($b, $i, 1);
        $valA = array_search($chA, $charOrder);
        $valB = array_search($chB, $charOrder);
        if($valA == $valB) continue;
        if($valA > $valB) return 1;
        return -1;
    }

    if(mb_strlen($a) == mb_strlen($b)) return 0;
    if(mb_strlen($a) > mb_strlen($b))  return -1;
    return 1;

}
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
usort($array, 'customSort');

EDIT: Sorry. I made many mistakes in the last code. Now is tested.

EDIT {2}: Everything with multibyte functions.

edited Jul 20, 2016 at 8:43

sectus

15.7k5 gold badges59 silver badges97 bronze badges

answered Oct 28, 2011 at 13:20

Jaison Erick

6455 silver badges11 bronze badges

4 Comments

Leonid Shevtsov Over a year ago

Unfortunately this won't work, as $a[$i] will return a single byte from the string, not a single char.

Jaison Erick Over a year ago

Before, yes, you were right. I Changed the algorithm a few minutes ago. Using str_split will work.

Leonid Shevtsov Over a year ago

str_split doesn't handle multibyte strings as well. :) See php.net/manual/en/function.mb-split.php#99851

hakre Over a year ago

don't run the strlen function that often, you only need to run them once upfront and you can already obtain the min value of both.

Amir Djaminov · Accepted Answer · 2017-09-05 19:55:42Z

7

If you want to use native solution, so i can propose this one

function compare($a, $b)
{
        $alphabet = 'aąbcćdeęfghijklłmnnoóqprstuvwxyzźż'; // i used polish letters
        $a = mb_strtolower($a);
        $b = mb_strtolower($b);

        for ($i = 0; $i < mb_strlen($a); $i++) {
            if (mb_substr($a, $i, 1) == mb_substr($b, $i, 1)) {
                continue;
            }
            if ($i > mb_strlen($b)) {
                return 1;
            }
            if (mb_strpos($alphabet, mb_substr($a, $i, 1)) > mb_strpos($alphabet, mb_substr($b, $i, 1))) {
                return 1;
            } else {
                return -1;
            }
        }
}

usort($needed_array, 'compare');

Not sure, that is the best solution, but it works for me =)

answered Sep 5, 2017 at 19:55

Amir Djaminov

1571 silver badge6 bronze badges

2 Comments

Amir Djaminov Over a year ago

Small update related to php 7 and new operator "spaceship". You can use <=> for return 1 or -1 in last condition.

Flower7C3 Over a year ago

You are missing ś character, so it should be: $alphabet = 'aąbcćdeęfghijklłmnnoóqprsśtuvwxyzźż'; And if You want to keep array keys, just use uksort function.

rodneyrehm · Accepted Answer · 2011-10-28 14:32:32Z

As for strcoll I guess it was a nice idea, but doesn't seem to work:

<?php

// Some 
$strings = array('Alpha', 'Älpha', 'Bravo');
// make it German: A, Ä, B
setlocale(LC_COLLATE, 'de_DE.UTF8', 'de.UTF8', 'de_DE.UTF-8', 'de.UTF-8');
usort($strings, 'strcoll');
var_dump($strings);
// as you can see, Ä is last, so this didn't work

A while back I wrote a UTF-8 to ASCII tool that would convert "älph#bla" to "aelph-bla". You could use this to "normalize" your input to make it sortable. It's basically a replacement similar to what @Nick said.

You should use a separate array for sorting, as calling urlify() in a usort() callback would be wasting a lot of resources. try

<?php
// data to sort
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
// container for modified strings
$_array = array();
foreach ($array as $k => $v) {
    // "normalize" utf8 to ascii
    $_array[$k] = urlify($v);
}
// sort the ASCII stuff (while preserving indexes)
asort($_array);
foreach ($_array as $key => &$v) {
    // copy the original value of the ASCIIfied element
    $v = $array[$k];
}
var_dump($_array);

If you have PHP5.3 or the intl PECL compiled, try @Thai's solution, seems sweet!

Heath Dutton · Accepted Answer · 2020-01-28 19:53:24Z

2

There are great answers here, but this is a dead simple solution for most situations.

function globalsort($array, $in = 'UTF-8', $out = 'ASCII//TRANSLIT//IGNORE')
{
    return usort($array, function ($a, $b) use ($in, $out) {
        $a = @iconv($in, $out, $a);
        $b = @iconv($in, $out, $b);
        return strnatcasecmp($a, $b);
    });
}

And use it like so:

globalsort($array);

answered Jan 28, 2020 at 19:53

Heath Dutton

4,6001 gold badge15 silver badges6 bronze badges

1 Comment

ESP32 Over a year ago

This does not work for me with letters like Ö, which should be sorted after O

Nick · Accepted Answer · 2011-10-28 13:59:54Z

1

I'd be tempted to loop through the array and convert to English characters before sorting. E.g.

<?php
  $array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');

  setlocale(LC_CTYPE, 'nl_BE.utf8');

  $newarray = array();
  foreach($array as $k => $v) {
    $newarray[$k] = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $v);
  }

  sort($newarray);
  print_r($newarray);
?>

Probably not the best in terms of processing speed/resources used. But sure does make it easier to understand the code.

Edit:

Thinking about it now, you might be better using some kind of lookup table, something like this:

<?php
  $accentedCharacters = array ( 'à', 'á', 'â', 'ã', 'ä', 'å', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Š', 'Ž', 'š', 'ž', 'Ÿ', 'À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý' ); 

  $replacementCharacters = array ( 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'S', 'Z', 's', 'z', 'Y', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y' );

  $array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');

  $newarray = array();
  foreach($array as $k => $v) {
    $newarray[$k] = str_replace($accentedCharacters,$replacementCharacters,$v);
  }

  sort($newarray);
  print_r($newarray);
?>

edited Oct 28, 2011 at 13:59

answered Oct 28, 2011 at 13:45

Nick

6,3462 gold badges32 silver badges48 bronze badges

2 Comments

middus Over a year ago

Why do you propose nl_BE? (Dutch as spoken/written in Belgium)

Nick Over a year ago

Honestly, it was the first locale that came to mind that would work given that dataset. Thinking about it now, he might be better using a conversion lookup table instead if the dataset is going to use other abnormal characters.

Collectives™ on Stack Overflow

How can I sort an array of UTF-8 strings in PHP?

7 Answers 7

4 Comments

3 Comments

4 Comments

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

3 Comments

4 Comments

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related