7

I'm running

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object

Shouldn't I have gotten a return of

s-a
S-tst
s-zaa
s-zf
srst2
ssrst

but instead I get the following:

s-a
srst2
ssrst
S-tst
s-zaa
s-zf

How is this possible ? Does sort-object only look at letters when sorting out ? Is there any way to sort it out by special characters ?

6

2 Answers 2

5

This behaviour is by design, but not always what people want/expect. If you want strings sorted with each character in ASCII order use this:

Add-Type @"
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Globalization;

    public class SimpleStringComparer: IComparer, IComparer<string>
    {

        private static readonly CompareInfo compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name);

        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase);
        }
        public SimpleStringComparer() {}
    }
"@


[string[]]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'

[System.Collections.Generic.List[string]]$list = [System.Collections.Generic.List[string]]::new()
$list.AddRange($myList)
[SimpleStringComparer]$comparer = [SimpleStringComparer]::new()
$list.Sort([SimpleStringComparer]::new())
$list

Outputs:

s'a
S'a
s'a1
S'a1
s-a
S-a
s-a1
S-a1
sa
Sa
sa1
Sa1
s^a
S^a

More Info

Per @TessellatingHeckler in the comments, you can sort strings in character code (ordinal) order by casting the string to a char array. However, that still handles hyphens and apostrophes in a potentially unexpected way (as these characters are ignored):

$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
$myList | Sort-Object -Property { [char[]] $_ }
s'a
S'a
s-a
S-a
s'a1
S'a1
s-a1
S-a1
s^a
S^a
sa
Sa
sa1
Sa1

The current sorting behaviour is by design. It appears that PowerShell implements a "Word Sort". This is documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx#SortingFunctions

In addition to ignoring hyphens and apostrophes (except when comparing otherwise identical strings), this sort also treats punctuation characters as coming before alphanumerics, and handles accented letters alongside their counterparts. A simple demo of this can be seen like so:

32..255 | %{[string][char][byte]$_} | sort

To define other sorting behaviours, currently you'd likely need to dip into .Net, like so:

Add-Type @"
    using System;
    using System.Runtime.InteropServices;
    using System.Collections;
    public class NumericStringComparer: IComparer
    {
        //https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
        [DllImport("shlwapi.dll")]
        public static extern int StrCmpLogicalW(string psz1, string psz2);
        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return StrCmpLogicalW(x, y);
        }
        public NumericStringComparer() {}
    }
"@

[System.Collections.ArrayList]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a', , '100a','1a','001a','2a','20a'
$myList.Sort([NumericStringComparer]::new())
$myList -join ', '

The above sorts strings the way Windows Explorer would (i.e. treating leading digits as numeric values):

s'a, s'a1, S'a, s-a, S-a, S-a1, S'a1, s-a1, S^a, s^a, 1a, 001a, 2a, Sa, Sa1, sa, sa1, 20a, 100a

I've submitted a feature suggestion to provide more PS friendly solutions on Sort-Object. See https://github.com/PowerShell/PowerShell/issues/4098

Sign up to request clarification or add additional context in comments.

3 Comments

Here is a Jon Skeet explanation of the sorting behavior. As far as I can tell, Sort-Object accepts a -Culture parameter, but there is no culture I can find with an Ordinal sort, and creating a new custom culture requires admin rights and registering it system-wide before it can be used, so that leaves PS a bit stuck.
$a = [System.Collections.ArrayList]@('srs', 's-a', 's-z'); $a.Sort([System.StringComparer]::Ordinal) - from stackoverflow.com/q/18543842/478656 (possibly makes this Q a duplicate)
Can you clarify that Sort-Object -Property { [char[]] $_ } is not a fix, but a demonstration of the problem.
1

You can achieve ASCII-style order by sorting string hex representation:

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_}

In case you need it case insensitive you can lowercase is first:

'S-tst','ssrst','srst2','s-zaa','s-a','s-zf' | Sort-Object {Format-Hex -InputObject $_.ToLower()}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.