0

Let's say I have a collection of strings like this, in no particular order:

"Filename (Text) (Comment)"
"Filename (Word) (1)"
"Filename (Word) (1) (Bad)"
"Filename (Text) (2)"
"Filename (Picture)"
"Filename (Misc)"
"Filename (Misc) (1)"
"Filename (Picture) (1)"
"Filename (Audio)"
"Filename (Audio) (Comment) (1)"

I want to sort them like this:

"Filename (Text) (2)"
"Filename (Word) (1)"
"Filename (Text) (Comment)"
"Filename (Audio) (Comment) (1)"
"Filename (Audio)"
"Filename (Picture) (1)"
"Filename (Picture)"
"Filename (Misc) (1)"
"Filename (Misc)"
"Filename (Word) (1) (Bad)"

In other words, I want to sort in this order: (Bad) to the bottom (Text/Word, Audio, Picture) on top of the (Bad)s and then sort on (2, 1, [blank]).

Note that I want Text and Word clumped together and then sorted on the number in parenthesis, and I don't care about anything that doesn't match those, so I don't care about (Comment).

Currently I'm getting this:

"Filename (Word) (1)"
"Filename (Text) (2)"
"Filename (Text) (Comment)"
"Filename (Audio) (Comment) (1)"
"Filename (Audio)"
"Filename (Picture) (1)"
"Filename (Picture)"
"Filename (Misc) (1)"
"Filename (Misc)"
"Filename (Word) (1) (Bad)"

So I'm very close to getting what I want.

Here's what I'm doing:

$badExpression = { if ($_ -match '\((Bad)\)') { $matches[1] } }
$documentExpression = { if ($_ -match '\((Text|Word)\)') { $matches[1] } }
$soundExpression = { if ($_ -match '\((Audio)\)') { $matches[1] } }
$imageExpression = { if ($_ -match '\((Picture)\)') { $matches[1] } }
$numberExpression = { if ($_ -match '\((2|1)\)') { $matches[1] } }

"Filename (Text) (Comment)", "Filename (Word) (1)", "Filename (Word) (1) (Bad)", `
    "Filename (Text) (2)", "Filename (Picture)", "Filename (Misc)", "Filename (Misc) (1)", `
    "Filename (Picture) (1)", "Filename (Audio)", "Filename (Audio) (Comment) (1)" | Sort-Object `
@{Expression = $badExpression; Descending = $false}, `
@{Expression = $documentExpression; Descending = $true}, `
@{Expression = $soundExpression; Descending = $true}, `
@{Expression = $imageExpression; Descending = $true}, `
@{Expression = $numberExpression; Descending = $true}

I'm definitely misunderstanding how exactly my expressions are being applied to the sort. I have a hunch that maybe I have to do a sequence of Sort-Objects, but I can't really figure out what.

Just to draw the focus: The issue I have is that (Word) and (Text) are supposed to be sorted as if they were the same.

Edit: Okay I think I got the behavior I want now with this, a subtle change, see the -replace I added.

$badExpression = { if ($_ -match '\((Bad)\)') { $matches[1] } }
$documentExpression = { if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'} }
$soundExpression = { if ($_ -match '\((Audio)\)') { $matches[1] } }
$imageExpression = { if ($_ -match '\((Picture)\)') { $matches[1] } }
$numberExpression = { if ($_ -match '\((2|1)\)') { $matches[1] } }

"Filename (Text) (Comment)", "Filename (Word) (1)", "Filename (Word) (1) (Bad)", `
    "Filename (Text) (2)", "Filename (Picture)", "Filename (Misc)", "Filename (Misc) (1)", `
    "Filename (Picture) (1)", "Filename (Audio)", "Filename (Audio) (Comment) (1)" | Sort-Object `
@{Expression = $badExpression; Descending = $false}, `
@{Expression = $documentExpression; Descending = $true}, `
@{Expression = $soundExpression; Descending = $true}, `
@{Expression = $imageExpression; Descending = $true}, `
@{Expression = $numberExpression; Descending = $true}

So for internet points: What is my script doing (besides working), I'd like to understand it a little bit better. :) I understand what's happening loosely, but I'd like a better grasp of it.

2 Answers 2

1

Its not straight forward. But I would prefer to first sort it to Natural first.

Get-Content C:\folder\String.txt | Sort-Object $ToNatural

Then at least you will directly get O/p like this (sorted with Audio,Misc,Pics,Texts):

"Filename (Audio) (Comment) (1)"
"Filename (Audio)"
"Filename (Misc) (1)"
"Filename (Misc)"
"Filename (Picture) (1)"
"Filename (Picture)"
"Filename (Text) (2)"
"Filename (Text) (Comment)"
"Filename (Word) (1) (Bad)"
"Filename (Word) (1)"

After that you can use some regex to match according to your need. Let me know if this approach helps you.

Sign up to request clarification or add additional context in comments.

1 Comment

Just stumbled over this answer, you missed to define the variable $ToNatural like this $ToNatural = { [regex]::Replace($_, '\d+', { $args[0].Value.PadLeft(20,"0") }) } I have it defined in my profile too :-)
0

Six years late, and I got nerd-sniped by the Related sidebar, instead of doing real work...

("Filename (Text) (Comment)", "Filename (Word) (1)",                                 
 "Filename (Word) (1) (Bad)", "Filename (Text) (2)",                                                    
 "Filename (Picture)", "AAAAname (Misc)",                                                               
 "Filename (Misc) (1)", "Filename (Picture) (1)",                                                       
 "Filename (Audio)", "Filename (Audio) (Comment) (1)") `                                                
| Sort-Object -Property `                                                                               
    @{Descending=$true; Expression={ switch -Regex ($_) { # reverse-sort by a numerical code     
        '\(bad\)'         { -2; break }    # push (bad) to the bottom (lowest code val)                   
        '\((text|word)\)' { 80; break }    # (text) or (word) come first (highest code val)               
        '\(audio\)'       { 60; break }                                                                 
        '\(picture\)'     { 40; break }                                                                 
        '\(misc\)'        { 20; break }                                                                 
        default           { -1 }           # unknowns are penultimate (second-lowest code val)
     }}},                                                                                               
    @{Descending=$true; Expression={ switch -Regex ($_) { #...then reverse-sort by the first (num)   
        '\(([0-9]+)\)'    { [Int32]($Matches[1]) }        # find any run of digits and convert to integer
        default           { -1 }                          # assume no negative (nums)                       
    }}} 
Filename (Text) (2)           
Filename (Word) (1)           
Filename (Text) (Comment)     
Filename (Audio) (Comment) (1)
Filename (Audio)              
Filename (Picture) (1)        
Filename (Picture)            
Filename (Misc) (1)           
Filename (Misc)               
Filename (Word) (1) (Bad)     

The above assumes you are interested in the existence of (Bad) over anything else, then the first instance of one of your named (Class) values, then the first (number). Multiple classes/multiple numbers are non-deterministic.

You also didn't indicate if you wanted to sort on the actual Filename text itself so, in the above, it is also non-deterministic. If you did want to sort the Filename text as well, try...

("Filename (Text) (Comment)", "Filename (Word) (1)",                                 
 "Filename (Word) (1) (Bad)", "Filename (Text) (2)",                                                    
 "Filename (Picture)", "AAAAname (Misc)",                                                               
 "Filename (Misc) (1)", "Filename (Picture) (1)",                                                       
 "Filename (Audio)", "Filename (Audio) (Comment) (1)") `                                                
| Sort-Object -Property `                                                                               
    @{Expression={                                                                                      
        ($_ -split '\(')[0]                # first, sort by everything up to the first (                
     }},                                                                                                
    @{Descending=$true; Expression={ switch -Regex ($_) { #...then reverse-sort by a numerical code     
        '\(bad\)'         { -2; break }    # push (bad) to the bottom (lowest code val)                   
        '\((text|word)\)' { 80; break }    # (text) or (word) come first (highest code val)               
        '\(audio\)'       { 60; break }                                                                 
        '\(picture\)'     { 40; break }                                                                 
        '\(misc\)'        { 20; break }                                                                 
        default           { -1 }           # unknowns are penultimate (second-lowest code val)            
     }}},                                                                                               
    @{Descending=$true; Expression={ switch -Regex ($_) { #...finally reverse-sort by the first (num)   
        '\(([0-9]+)\)'    { [Int32]($Matches[1]) }        # find any run of digits and convert to integer
        default           { -1 }                          # assume no negative (nums)                       
    }}}                                                                                                 
AAAAname (Misc)               
Filename (Text) (2)           
Filename (Word) (1)           
Filename (Text) (Comment)     
Filename (Audio) (Comment) (1)
Filename (Audio)              
Filename (Picture) (1)        
Filename (Picture)            
Filename (Misc) (1)           
Filename (Word) (1) (Bad)     

So for internet points: What is my script doing...

$badExpression = { if ($_ -match '\((Bad)\)') { $matches[1] } }
$documentExpression = { if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'} }
$soundExpression = { if ($_ -match '\((Audio)\)') { $matches[1] } }
$imageExpression = { if ($_ -match '\((Picture)\)') { $matches[1] } }
$numberExpression = { if ($_ -match '\((2|1)\)') { $matches[1] } }

"Filename (Text) (Comment)", "Filename (Word) (1)", "Filename (Word) (1) (Bad)", `
    "Filename (Text) (2)", "Filename (Picture)", "Filename (Misc)", "Filename (Misc) (1)", `
    "Filename (Picture) (1)", "Filename (Audio)", "Filename (Audio) (Comment) (1)" | Sort-Object `
@{Expression = $badExpression; Descending = $false}, `
@{Expression = $documentExpression; Descending = $true}, `
@{Expression = $soundExpression; Descending = $true}, `
@{Expression = $imageExpression; Descending = $true}, `
@{Expression = $numberExpression; Descending = $true}

Your $expression variables are each storing a script block which would return a specific word, if that word is found in the current pipeline item; e.g. the script in $badExpression returns Bad if the pipeline item ($_) contains '(Bad)', and returns nothing if it does not. Nothing comes before Bad in any .

Your Sort-Object sorts the list on each of those expressions, in turn. For each of your strings, this extracts the following:

$_                                                                         : Filename (Text) (Comment)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  : DocumentDocument
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                :

$_                                                                         : Filename (Word) (1)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  : DocumentDocument
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 1

$_                                                                         : Filename (Word) (1) (Bad)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                : Bad
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  : DocumentDocument
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 1

$_                                                                         : Filename (Text) (2)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  : DocumentDocument
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 2

$_                                                                         : Filename (Picture)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            : Picture
 if ($_ -match '\((2|1)\)') { $matches[1] }                                :

$_                                                                         : Filename (Misc)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                :

$_                                                                         : Filename (Misc) (1)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 1

$_                                                                         : Filename (Picture) (1)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              :
 if ($_ -match '\((Picture)\)') { $matches[1] }                            : Picture
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 1

$_                                                                         : Filename (Audio)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              : Audio
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                :

$_                                                                         : Filename (Audio) (Comment) (1)
 if ($_ -match '\((Bad)\)') { $matches[1] }                                :
 if ($_ -match '\((Text|Word)\)') { $matches[1] -replace ".*",'Document'}  :
 if ($_ -match '\((Audio)\)') { $matches[1] }                              : Audio
 if ($_ -match '\((Picture)\)') { $matches[1] }                            :
 if ($_ -match '\((2|1)\)') { $matches[1] }                                : 1

The $expression scripts are somewhat circuitous, in that it doesn't strictly matter what they return as long as it is binary (exists vs not-exists). You could easily replace most of them with just the { $_ -match '\(...\) }' part, if you wanted to, and sort on the resulting booleans.

The reason your first approach sorts (Text) before (Word) is because that is what you are asking it to sort, when it gets to that expression. $documentExpression returns the term that was matched $matches[1], and \((Text|Word)\) means that term could be Text or Word. As such your sort is done on across three possible terms (Word, Text or nothing) instead of across two (exists, not-exists).

Your second version of $documentExpression takes whatever term was in $Matches[1], discards it and replaces it with DocumentDocument, which resolves it back to the binary exists/not-exists condition.

(That it replaces it with DocumentDocument instead of just Document is peculiar. My best guess is that there is a race condition between the -match and the -replace – both operators would usually populate $Matches, and it is probably choking on reading the value and writing at the same time.)

$numberExpression is the only expression that is legitimately expected to have more than one value to sort over, because you have asked for 1 or 2, so sorting on $Matches[1] there makes some sense.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.