1

I'm attempting to modify the OrderedImportsFixer class in php-cs-fixer so I can clean up my files the way I want. What I want is to order my imports in a fashion similar to what you'd see in a filesystem listing, with "directories" listed before "files".

So, given this array:

$indexes = [
    26 => ["namespace" => "X\\Y\\Zed"],
    9 =>  ["namespace" => "A\\B\\See"],
    3 =>  ["namespace" => "A\\B\\Bee"],
    38 => ["namespace" => "A\\B\\C\\Dee"],
    51 => ["namespace" => "X\\Wye"],
    16 => ["namespace" => "A\\Sea"],
    12 => ["namespace" => "A\\Bees"],
    31 => ["namespace" => "M"],
];

I'd like this output:

$sorted = [
    38 => ["namespace" => "A\\B\\C\\Dee"],
    3 =>  ["namespace" => "A\\B\\Bee"],
    9 =>  ["namespace" => "A\\B\\See"],
    12 => ["namespace" => "A\\Bees"],
    16 => ["namespace" => "A\\Sea"],
    26 => ["namespace" => "X\\Y\\Zed"],
    51 => ["namespace" => "X\\Wye"],
    31 => ["namespace" => "M"],
];

As in a typical filesystem listing:

screenshot of a directory listing

I've been going at uasort for a while (key association must be maintained) and have come close. Admittedly, this is due more to desperate flailing than any sort of rigorous methodology. Not really having a sense of how uasort works is kind of limiting me here.

// get the maximum number of namespace components in the list
$ns_counts = array_map(function($val){
    return count(explode("\\", $val["namespace"]));
}, $indexes);
$limit = max($ns_counts);

for ($depth = 0; $depth <= $limit; $depth++) {
    uasort($indexes, function($first, $second) use ($depth, $limit) {
        $fexp = explode("\\", $first["namespace"]);
        $sexp = explode("\\", $second["namespace"]);
        if ($depth === $limit) {
            // why does this help?
            array_pop($fexp);
            array_pop($sexp);
        }
        $fexp = array_slice($fexp, 0, $depth + 1, true);
        $sexp = array_slice($sexp, 0, $depth + 1, true);
        $fimp = implode(" ", $fexp);
        $simp = implode(" ", $sexp);
        //echo "$depth: $fimp <-> $simp\n";
        return strnatcmp($fimp, $simp);
    });
}
echo json_encode($indexes, JSON_PRETTY_PRINT);

This gives me properly sorted output, but with deeper namespaces on the bottom instead of the top:

{
    "31": {
        "namespace": "M"
    },
    "12": {
        "namespace": "A\\Bees"
    },
    "16": {
        "namespace": "A\\Sea"
    },
    "3": {
        "namespace": "A\\B\\Bee"
    },
    "9": {
        "namespace": "A\\B\\See"
    },
    "38": {
        "namespace": "A\\B\\C\\Dee"
    },
    "51": {
        "namespace": "X\\Wye"
    },
    "26": {
        "namespace": "X\\Y\\Zed"
    }
}

I'm thinking I may have to build a separate array for each level of namespace and sort it separately, but have drawn a blank on how I might do that. Any suggestions for getting the last step of this working, or something completely different that doesn't involve so many loops?

4 Answers 4

2

We divide this into 4 steps.

Step 1: Create hierarchical structure from the dataset.

function createHierarchicalStructure($indexes){
    $data = [];
    foreach($indexes as $d){
        $temp = &$data;
        foreach(explode("\\",$d['namespace']) as $namespace){
            if(!isset($temp[$namespace])){
                $temp[$namespace] = [];
            }
            $temp = &$temp[$namespace];
        }
    }
    
    return $data;
}

Split the namespaces by \\ and maintain a $data variable. Use & address reference to keep editing the same copy of the array.

Step 2: Sort the hierarchy in first folders then files fashion.

function fileSystemSorting(&$indexes){
    foreach($indexes as $key => &$value){
        fileSystemSorting($value);
    }
    
    uksort($indexes,function($key1,$key2) use ($indexes){
        if(count($indexes[$key1]) == 0 && count($indexes[$key2]) > 0) return 1;
        if(count($indexes[$key2]) == 0 && count($indexes[$key1]) > 0) return -1;
        return strnatcmp($key1,$key2);
    });
}

Sort the subordinate folders and use uksort for the current level of folders. Vice-versa would also work. If both 2 folders in comparison have subfolders, compare them as strings, else if one is a folder and another is a file, make folders come above.

Step 3: Flatten the hierarchical structure now that they are in order.

function flattenFileSystemResults($hierarchical_data){
    $result = [];
    foreach($hierarchical_data as $key => $value){
        if(count($value) > 0){
            $sub_result = flattenFileSystemResults($value);
            foreach($sub_result as $r){
                $result[] = $key . "\\" . $r;
            }   
        }else{
            $result[] = $key;
        }
    }
    
    return $result;
}

Step 4: Restore the initial data keys back and return the result.

function associateKeys($data,$indexes){
    $map = array_combine(array_column($indexes,'namespace'),array_keys($indexes));
    $result = [];
    foreach($data as $val){
        $result[ $map[$val] ] = ['namespace' => $val];
    }
    return $result;
}

Driver code:

function foldersBeforeFiles($indexes){
   $hierarchical_data = createHierarchicalStructure($indexes);
   fileSystemSorting($hierarchical_data);
   return associateKeys(flattenFileSystemResults($hierarchical_data),$indexes);
}

print_r(foldersBeforeFiles($indexes));

Demo: https://3v4l.org/cvoB2

Sign up to request clarification or add additional context in comments.

12 Comments

Depends on the project. I know some of the core Laravel files have > 20 import statements, with namespaces up to 6 or 7 levels deep.
And of course, there are always outliers. ;)
Nice trick with the references to build the hierarchy, that's what I was trying to do originally but drew a blank. I appreciate the effort but unfortunately this doesn't preserve the array's keys. php-cs-fixer needs the keys to reconstruct the document after fixing it.
It seems to, yes
Thanks for the effort, I marked Jeto's answer accepted as it was easier to integrate into the existing code.
|
1

I believe the following should work:

uasort($indexes, static function (array $entry1, array $entry2): int {  
    $ns1Parts = explode('\\', $entry1['namespace']);
    $ns2Parts = explode('\\', $entry2['namespace']);

    $ns1Length = count($ns1Parts);
    $ns2Length = count($ns2Parts);

    for ($i = 0; $i < $ns1Length && isset($ns2Parts[$i]); $i++) {
        $isLastPartForNs1 = $i === $ns1Length - 1;
        $isLastPartForNs2 = $i === $ns2Length - 1;

        if ($isLastPartForNs1 !== $isLastPartForNs2) {
            return $isLastPartForNs1 <=> $isLastPartForNs2;
        }

        $nsComparison = $ns1Parts[$i] <=> $ns2Parts[$i];

        if ($nsComparison !== 0) {
            return $nsComparison;
        }
    }

    return 0;
});

What it does is:

  • split namespaces into parts,
  • compare each part starting from the first one, and:
    • if we're at the last part for one and not the other, prioritize the one with the most parts,
    • otherwise, if the respective parts are different, prioritize the one that is before the other one alphabetically.

Demo

3 Comments

When I tried this one live, it had trouble with single-element namespaces (e.g. Throwable). Kept putting them at the bottom of the list.
Yeah I just realized you probably can't do it with a single/basic sort pass, as two entries might be sorted differently depending on other ones. For instance, if A and X\Y\Zed were the only two entries, X\Y\Zed should come first (folders first), right? Whereas it's the other way around in your example's case.
Actually yes, you're right I should expect those entries to be at the bottom of the list. Going to test with some more live data and see how it goes.
1

Here's another version that breaks the steps down further that, although it might not be the most optimal, definitely helps my brain think about it. See the comments for more details on what is going on:

uasort(
    $indexes,
    static function (array $a, array $b) {

        $aPath = $a['namespace'];
        $bPath = $b['namespace'];

        // Just in case there are duplicates
        if ($aPath === $bPath) {
            return 0;
        }

        // Break into parts
        $aParts = explode('\\', $aPath);
        $bParts = explode('\\', $bPath);

        // If we only have a single thing then it is a root-level, just compare the item
        if (1 === count($aParts) && 1 === count($bParts)) {
            return $aPath <=> $bPath;
        }

        // Get the class and namespace (file and folder) parts
        $aClass = array_pop($aParts);
        $bClass = array_pop($bParts);

        $aNamespace = implode('\\', $aParts);
        $bNamespace = implode('\\', $bParts);

        // If the namespaces are the same, sort by class name
        if ($aNamespace === $bNamespace) {
            return $aClass <=> $bClass;
        }

        // If the first namespace _starts_ with the second namespace, sort it first
        if (0 === mb_strpos($aNamespace, $bNamespace)) {
            return -1;
        }

        // Same as above but the other way
        if (0 === mb_strpos($bNamespace, $aNamespace)) {
            return 1;
        }

        // Just only by namespace
        return $aNamespace <=> $bNamespace;
    }
);

Online demo

4 Comments

I should note, this uses the spaceship operator, <=> but your code uses strnatcmp which handles numbers in strings and might be better suited.
Numbers are pretty rare in namespaces, though I do have some old code imported from PEAR2\\Net. I will see how it works...
I liked this approach best because of no loops, but it didn't work once I fed some real data to it. In my sample data, replacing A\B\Bee and A\B\See with Am\B\Bee and Am\B\See results in those items being ordered before A\Bees.
@miken32, thanks for more sample data, I'll see if I can improve this!
1

I find no fault with Jeto's algorithmic design, but I decided to implement it more concisely. My snippet avoids iterated function calls and arithmetic in the for() loop, uses a single spaceship operator, and a single return. My snippet is greater than 50% shorter and I generally find it easier to read, but then everybody thinks their own baby is cute, right?

Code: (Demo)

uasort($indexes, function($a, $b) {  
    $aParts = explode('\\', $a['namespace']);
    $bParts = explode('\\', $b['namespace']);
    $aLast = count($aParts) - 1;
    $bLast = count($bParts) - 1;

    for ($cmp = 0, $i = 0; $i <= $aLast && !$cmp; ++$i) {
        $cmp = [$i === $aLast, $aParts[$i]] <=> [$i === $bLast, $bParts[$i]];
    }
    return $cmp;
});

Like Jeto's answer, it iterates each array simultaneously and

  • checks if either element is the last element of the array, if so it is bumped down the list (because we want longer paths to win tiebreakers);
  • if neither element is the last in its array, then compare the elements' current string alphabetically.
  • The process repeats until a non-zero evaluation is generated.
  • Since duplicate entries are not expected to occur, the return value should always be -1 or 1 (never 0)

Note, I am halting the for() loop with two conditions.

  1. If $i is greater than the number of elements in $aParts (only) -- because if $bParts has fewer elements than $aParts, then $cmp will generate a non-zero value before a Notice is triggered.
  2. If $cmp is a non-zero value.

Finally, to explain the array syntax on either side of the spaceship operator...
The spaceship operator will compare the arrays from left to right, so it will behave like:

leftside[0] <=> rightside[0] then leftside[1] <=> rightside[1]

Making multiple comparisons in this way does not impact performance because there are no function calls on either side of the <=>. If there were function calls involved, it would be more performant to make individual comparisons in a fallback manner like:

fun($a1) <=> fun($b1) ?: fun($a2) <=> fun($b2)

This way subsequent function calls are only actually made if a tiebreak is necessary.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.