13

The following list does not sort properly (IMHO):

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
$a | sort
ABC_
ABCA
ABCZ

My handy ASCII chart and Unicode C0 Controls and Basic Latin chart have the underscore (low line) with an ordinal of 95 (U+005F). This is a higher number than the capital letters A-Z. Sort should have put the string ending with an underscore last.

Get-Culture is en-US

The next set of commands does what I expect:

$a = @( 'ABCZ', 'ABC_', 'ABCA' )
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABCA
ABCZ
ABC_

Now I create an ANSI encoded file containing those same 3 strings:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
[System.Collections.ArrayList] $al = $a
$al.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ

Once more the string containing the underscore/lowline is not sorted correctly. What am I missing?


Edit:

Let's reference this example #4:

'A' -lt '_'
False
[char] 'A' -lt [char] '_'
True

Seems like both statements should be False or both should be True. I'm comparing strings in the first statement, and then comparing the Char type. A string is merely a collection of Char types so I think the two comparison operations should be equivalent.

And now for example #5:

Get-Content -Encoding Byte data.txt
65 66 67 90 13 10  65 66 67 95 13 10  65 66 67 65 13 10
$a = Get-Content data.txt
$b = @( 'ABCZ', 'ABC_', 'ABCA' )
$a[0] -eq $b[0]; $a[1] -eq $b[1]; $a[2] -eq $b[2];
True
True
True
[System.Collections.ArrayList] $al = $a
[System.Collections.ArrayList] $bl = $b
$al[0] -eq $bl[0]; $al[1] -eq $bl[1]; $al[2] -eq $bl[2];
True
True
True
$al.Sort( [System.StringComparer]::Ordinal )
$bl.Sort( [System.StringComparer]::Ordinal )
$al
ABC_
ABCA
ABCZ
$bl
ABCA
ABCZ
ABC_

The two ArrayList contain the same strings, but are sorted differently. Why?

10
  • 2
    I think what you are missing is that you are expecting non-standard responses from Windows. It has always prioritized symbols before letters, just look at the file system. Make files with those names, sort by name, and it will sort them the same way with ABC_ being first. Commented Sep 8, 2014 at 23:02
  • 5
    String sorting is not done by ASCII code any more. Commented Sep 9, 2014 at 3:59
  • 1
    Also as far as I can tell the weirdness with the second part has something to do with ArrayList. Using a strongly typed String.Collections.Generic.List[string] sorts as expected. Also, using an string[] sorts as expected with Array::Sort, but object[] does not. Commented Sep 9, 2014 at 4:39
  • You'll also have to confirm what Get-Content data.txt actually returns. Commented Sep 10, 2014 at 3:58
  • 1
    Posted to Microsoft Connect as a bug. See connect.microsoft.com/PowerShell/feedbackdetail/view/974422 Commented Sep 23, 2014 at 20:53

5 Answers 5

3

In many cases PowerShell wrap/unwrap objects in/from PSObject. In most cases it is done transparently, and you does not even notice this, but in your case it is what cause your trouble.

$a='ABCZ', 'ABC_', 'ABCA'
$a|Set-Content data.txt
$b=Get-Content data.txt

[Type]::GetTypeArray($a).FullName
# System.String
# System.String
# System.String
[Type]::GetTypeArray($b).FullName
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject
# System.Management.Automation.PSObject

As you can see, object returned from Get-Content are wrapped in PSObject, that prevent StringComparer from seeing underlying strings and compare them properly. Strongly typed string collecting can not store PSObjects, so PowerShell will unwrap strings to store them in strongly typed collection, that allows StringComparer to see strings and compare them properly.

Edit:

First of all, when you write that $a[1].GetType() or that $b[1].GetType() you does not call .NET methods, but PowerShell methods, which normally call .NET methods on wrapped object. Thus you can not get real type of objects this way. Even more, them can be overridden, consider this code:

$c='String'|Add-Member -Type ScriptMethod -Name GetType -Value {[int]} -Force -PassThru
$c.GetType().FullName
# System.Int32

Let us call .NET methods thru reflection:

$GetType=[Object].GetMethod('GetType')
$GetType.Invoke($c,$null).FullName
# System.String
$GetType.Invoke($a[1],$null).FullName
# System.String
$GetType.Invoke($b[1],$null).FullName
# System.String

Now we get real type for $c, but it says that type of $b[1] is String not PSObject. As I say, in most cases unwrapping done transparently, so you see wrapped String and not PSObject itself. One particular case when it does not happening is that: when you pass array, then array elements are not unwrapped. So, let us add additional level of indirection here:

$Invoke=[Reflection.MethodInfo].GetMethod('Invoke',[Type[]]([Object],[Object[]]))
$Invoke.Invoke($GetType,($a[1],$null)).FullName
# System.String
$Invoke.Invoke($GetType,($b[1],$null)).FullName
# System.Management.Automation.PSObject

Now, as we pass $b[1] as part of array, we can see real type of it: PSObject. Although, I prefer to use [Type]::GetTypeArray instead.

About StringComparer: as you can see, when not both compared objects are strings, then StringComparer rely on IComparable.CompareTo for comparison. And PSObject implement IComparable interface, so that sorting will be done according to PSObject IComparable implementation.

Sign up to request clarification or add additional context in comments.

3 Comments

I think your on to something. But the sort does rearrange the PSObject items, just not as I would expect. $a[1].GetType().Name and $b[1].GetType().Name both return "String". Can you point me to documentation with more details on arrays, PSObjects, and how the StringComparer might work when presented PSObjects? Thanks.
@bretth I update my answer. Sorry, I can not point you to good documentation about that. Great part of my PowerShell knowledge obtained thru experimentation and digging with ILSpy. IMHO, PowerShell really lacking documentation about many internal parts.
1

Many moons later, let me attempt a comprehensive summary:

By design:

  • PowerShell's Sort-Object does not perform ordinal sorting of strings (the latter being based strictly on the Unicode code points (sometimes still loosely, but incorrectly referred to as "ASCII values") of the characters in the string).

  • Instead, like the underlying .NET runtime, it performs culture-specific sorting based on linguistic rules by default.

    • Note: In PowerShell (Core) v7.1+ / .NET 5+ the sorting rules changed, due to the latter's move to the ICU libraries even on Windows (which had previously used NLS) - see .NET globalization and ICU

      • The resulting changes, notably due to sorting _ before ., spurred complaints, because it changes long-established behavior in the context of file-system-related operations - see GitHub issue #14757.
    • Also, a notable difference is that PowerShell, unlike .NET, is generally case-insensitive by default, requiring the -CaseSensitive switch as an opt-in to case-sensitivity.

    • Unfortunately, as of PowerShell (Core) 7.4.x, unlike with direct use of .NET APIs, you can NOT make Sort-Object perform ordinal sorting.


Unexpectedly:

  • As for why sorting the array resulting from $a = Get-Content data.txt vs. an array literal behaves differently - user4003407's helpful answer explains the technical underpinnings in detail, but let me attempt a pragmatic summary:

    • Behind the scenes, PowerShell situationally uses [psobject] wrappers around .NET objects, which are meant to be invisible helper objects, but, unfortunately, they aren't always invisible and sometimes cause unexpected behavior.

    • Notably, objects output from cmdlets (such as Get-Content) are invariably [psobject]-wrapped.

    • Thus, when such [psobject]-wrapped objects are implicitly collected in a regular PowerShell array - as happens when you capture the lines of a text file being streamed by Get-Content in a variable, for instance - you'll end up with an [object[]-typed array whose elements happen to be [psobject] instances (which happen to wrap [string] instances).

    • By contrast, if you capture an array of string literals in a variable - e.g. $a = @( 'ABCZ', 'ABC_', 'ABCA' ) - the resulting [object[]]-typed array has [string] elements, i.e. no [psobject] wrappers are involved.

    • When an [object[]]-typed array or non-generic array-like type such as System.Collections.ArrayList is used, the above distinction matters:

      • Only if the original array's elements are [string]-typed does a sorting qualifier such as [System.StringComparer]::Ordinal matter; otherwise, this qualifier is ignored, and comparison for sorting is delegated to whatever type the elements happen to be:

      • [psobject] happens to implement the IComparable interface itself, and it effectively delegates to the [string] type's implementation behind the scenes, albeit using the latter's default behavior - which is the culture-sensitive, case-sensitive sorting ([StringComparer]::CurrentCulture)).
        This behavior is counter-intuitive, but cannot be changed for fear of breaking backward compatibility - see GitHub issue #14829 for discussion.

      • For a given array (list) $a, a simple test as to whether an element is [psobject]-wrapped is $a[0] -is [psobject], for instance.

    • To provide simplified examples:

      # == OK: Elements are NOT [psobject]-wrapped
      $a = '_', 'A'
      [Array]::Sort($a, [StringComparer]::Ordinal)
      $a # -> 'A', '_'  - ORDINAL sorting
         # (Code point of 'A' is 0x41 (65), of '_' is 0x5f (95))
      
      [System.Collections.ArrayList] $al = '_', 'A'
      $al.Sort([StringComparer]::Ordinal)
      $al # -> 'A', '_' - ORDINAL sorting
      
      # == !! BROKEN: Elements ARE [psobject]-wrapped, therefore
      #    !! [StringComparer]::Ordinal is *ignored*
      
      # !! Using Write-Output causes the array elements to be [psobject]-wrapped,
      # !! as Get-Content does.
      $a = Write-Output '_', 'A'
      # !! [StringComparer]::Ordinal is IGNORED, due to [psobject] wrappers.
      [Array]::Sort($a, [StringComparer]::Ordinal)
      $a # -> 'A', '_' - quiet fallback to the default, [StringComparer]::CurrentCulture
      
      # !! Using Write-Output causes the array elements to be [psobject]-wrapped
      [System.Collections.ArrayList] $al = Write-Output '_', 'A'
      # !! [StringComparer]::Ordinal is IGNORED, due to [psobject] wrappers.
      $al.Sort([StringComparer]::Ordinal)
      $al # -> 'A', '_' - quiet fallback to the default, [StringComparer]::CurrentCulture
      
    • Workaround:

      • Casting an [object[]] array whose elements are [psobject]-wrapped to a specific type - such as [string[]] in the case at hand - implicitly discards the [psobject] wrappers:

        # The [string[]] cast implicitly discards the [psobject] wrappers
        # in the process of creating a *strongly typed* array.
        [string[]] $a = Write-Output '_', 'A'
        [Array]::Sort($a, [StringComparer]::Ordinal)
        $a # -> 'A', '_'  - ORDINAL sorting, as requested.
        
  • As for PowerShell's comparison operators and the inconsistent [char] handling:

    • PowerShell's comparison operators such as -eq and -lt:

      • With [string] instances, like Sort-Object, comparison operators use linguistic rules, however based on the invariant culture rather than on the current culture (the latter being reflected in [cultureinfo]::CurrentCulture / Get-Culture, $PSCulture).

        • While the distinction between the current and the invariant culture will often not matter in practice, it definitely can.
        • See this answer for a comprehensive discussion of the contexts in which PowerShell uses one or the other.
      • By contrast, [char] instances are compared by their Unicode code points, albeit inconsistently, depending on the specific comparison operator used; note that PowerShell has no [char] literal representation, so a [char] cast is always required:

        • -eq (aka -ieq) and its case-sensitive variant, -ceq, as well as its negated variants (-ne (aka -ine) and -cne) do take case into account, as implied / requested: If case-insensitivity is required, the operands are first normalized via .ToUpperInvariant() behind the scenes:

          # Equivalent to:
          # [uint16] [char]::ToUpperInvariant('A') -eq [uint16] [char]::ToUpperInvariant('a')
          # Casting to [uint16] returns a [char] instance's Unicode code point.
          [char] 'A' -eq [char] 'a' # -> $true
          
        • By contrast, -lt (less than) and -gt (greater than) and its variants unexpectedly ignore case-sensitivity:

          # !! Unexpectedly ALSO $true
          # !! Equivalent to [uint16] [char] 'A' -lt [uint16] [char] 'a'
          # !! i.e. NO case normalization is performed.  
          [char] 'A' -lt [char] 'a' # ditto for -ilt and (as expected) for -clt
          

1 Comment

Wow! This will take a (very long) bit for me to parse. I very much appreciate all the detail you put into your response and hope it helps many people in the future.
0

Windows uses Unicode, not ASCII, so what you're seeing is the Unicode sort order for en-US. The general rules for sorting are:

  1. numbers, then lowercase and uppercase intermixed
  2. Special characters occur before numbers.

Extending your example,

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ABC4', 'abca' )

$a | sort-object
ABC_
ABC4
abca
ABCA
ABCZ

2 Comments

But the OP is explicitly asking for Ordinal order, and each individual Object in $a reports a type of String, but they don't fit in a String array. So, yes, we're getting the default Unicode ordering on Object instead of the requested Ordinal ordering. But why?
Unicode string sorting can be seen in practice here: minaret.info/test/sort.msp
0

I tried the following and the sort is as expected:

[System.Collections.ArrayList] $al = [String[]] $a

Comments

0

If you really want to do this.... I will admit it's ugly but it works. I would create a function if this is something you need to do on a regular basis.

$a = @( 'ABCZ', 'ABC_', 'ABCA', 'ab1z' ) $ascii = @()

foreach ($item in $a) { $string = "" for ($i = 0; $i -lt $item.length; $i++) { $char = [int] [char] $item[$i] $string += "$char;" }

$ascii += $string
}

$b = @()

foreach ($item in $ascii | Sort-Object) { $string = "" $array = $item.Split(";") foreach ($char in $array) { $string += [char] [int] $char }

$b += $string
}

$a $b

ABCA ABCZ ABC_

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.