2

I have a text file with a simple structure, which is actually the content of an ftp:

1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0
...(and so on)
random string
random string

I'm using get-content to get the contents of the file but then I want to be able to retrieve only the lines that contain the max number and the max-1 number. In this case for example I would want it to return:

10.0
10.0b
11.0
11.0f

I tried using sort-object but didn't work. Is there a way to use sort-object in such a manner so it knows it is sorting numbers and not strings(so that it doesn't place 10 after 1), then sort according to the digits before the full stop and ignore the random strings at the end alltogether...

Or if you have another method to suggest please do so... Thank you.

2 Answers 2

2

You can pass scriptblocks to some cmdlets, in this case Sort-Object and Group-Object. To clarify a bit more:

  1. Load the data

    Get-Content foo.txt |
    
  2. Group by the number (ignoring the suffix, if present):

        Group-Object { $_ -replace '\..*$' } |
    

    This will remove non-digits at the end of the string first and use the remainder of the string (hopefully now just containing a floating-point number) as the group name.

  3. Sort by that group name, numerically.

        Sort-Object { [int] $_.Name } |
    

    This is done simply by converting the name of the group to a number and sort by that, similar to how we grouped by something derived from the original line.

  4. Then we can get the last two groups, representing all lines with the maximum number and second-to-maximum number and unwrap the groups. The -Last parameter is fairly self-explanatory, the -ExpandProperty selects the values of a property instead of constructing a new object with a filtered property list:

        Select-Object -Last 2 -ExpandProperty Group
    

And there we are. You can try this pipeline in various stages just to get a feeling for what the commands to:

PS Home:\> gc foo.txt
1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'}

Count Name                      Group
----- ----                      -----
    2 1.0                       {1.0, 1.0a}
    2 10.0                      {10.0, 10.0b}
    2 11.0                      {11.0, 11.0f}
    1 2.0                       {2.0}
    1 3.0                       {3.0}
    1 4.0                       {4.0}

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name}

Count Name                      Group
----- ----                      -----
    2 1.0                       {1.0, 1.0a}
    1 2.0                       {2.0}
    1 3.0                       {3.0}
    1 4.0                       {4.0}
    2 10.0                      {10.0, 10.0b}
    2 11.0                      {11.0, 11.0f}

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name} | select -l 2 -exp group
10.0
10.0b
11.0
11.0f

If you need the items within the groups (and this in the final result for the last two groups) sorted by suffix, you can stick another Sort-Object directly after the Get-Content.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. This helps a lot. I have a problem using group command. See the result: 1 6.0b1 {6.0b1} . It won't remove the not digit character, thus not group them correctly. Is there a way to remove everything after the full stop. Not just non digits?
Yes, see edit. I'm sorry, I got the impression from your test data that there was a floating-point number optionally followed by a letter and wrote my regexes accordingly.
0

You can pass an expression to Sort-Object, the sort will then use that expression to sort the objects. This is done by passing a hash table with key expression (can be abbreviated to e). To reverse the order add a second key descending (or d) with value $true.

In your case

...input... | Sort @{e={convert $_ as required}}

Multiple property names and hash tables can be supplied: so 11.0f could be split into a number and suffix.

If there is a lot of overlap between the sort expressions you could pre-process the input into objects with the sort properties first (and remove after):

...input... | %{ 
  if ($_ -match '^(\d+\.0)(.)?') {
    new-object PSObject -prop @{value=$_; a=[double]::Parse($matches[1]); b=$matches[2] }
  } else {
    new-object PSObject -prop @{value=$_; a=[double]::MinValue; b=$null }
  }
} | sort a,b | select -expand value

3 Comments

You don't need a hashtable for Sort-Object, a scriptblock suffices. And generally I think the object solution is overkill if you don't need to retain the objects for some reason or another.
@Joey True re. hash table. In this case because there is a single (maybe failing) regex I think performing the transform once is simpler (consider two script blocks passed to sort both doing if ($_ -match ...).
Yes, in that case it would make sense. By the way, I find switch to be a quite awesome way of writing a quick text file parser which may convert lines or a bunch of lines to objects :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.