Powershell v2, getting specific lines from file, sorting

Question

I have a text file with a simple structure, which is actually the content of an ftp:

1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0
...(and so on)
random string
random string

I'm using get-content to get the contents of the file but then I want to be able to retrieve only the lines that contain the max number and the max-1 number. In this case for example I would want it to return:

10.0
10.0b
11.0
11.0f

I tried using sort-object but didn't work. Is there a way to use sort-object in such a manner so it knows it is sorting numbers and not strings(so that it doesn't place 10 after 1), then sort according to the digits before the full stop and ignore the random strings at the end alltogether...

Or if you have another method to suggest please do so... Thank you.

Joey · Accepted Answer · 2012-04-29 12:04:58Z

2

You can pass scriptblocks to some cmdlets, in this case Sort-Object and Group-Object. To clarify a bit more:

Load the data
```
Get-Content foo.txt |
```
Group by the number (ignoring the suffix, if present):
```
    Group-Object { $_ -replace '\..*$' } |
```
This will remove non-digits at the end of the string first and use the remainder of the string (hopefully now just containing a floating-point number) as the group name.
Sort by that group name, numerically.
```
    Sort-Object { [int] $_.Name } |
```
This is done simply by converting the name of the group to a number and sort by that, similar to how we grouped by something derived from the original line.
Then we can get the last two groups, representing all lines with the maximum number and second-to-maximum number and unwrap the groups. The -Last parameter is fairly self-explanatory, the -ExpandProperty selects the values of a property instead of constructing a new object with a filtered property list:
```
    Select-Object -Last 2 -ExpandProperty Group
```

And there we are. You can try this pipeline in various stages just to get a feeling for what the commands to:

PS Home:\> gc foo.txt
1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'}

Count Name                      Group
----- ----                      -----
    2 1.0                       {1.0, 1.0a}
    2 10.0                      {10.0, 10.0b}
    2 11.0                      {11.0, 11.0f}
    1 2.0                       {2.0}
    1 3.0                       {3.0}
    1 4.0                       {4.0}

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name}

Count Name                      Group
----- ----                      -----
    2 1.0                       {1.0, 1.0a}
    1 2.0                       {2.0}
    1 3.0                       {3.0}
    1 4.0                       {4.0}
    2 10.0                      {10.0, 10.0b}
    2 11.0                      {11.0, 11.0f}

PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name} | select -l 2 -exp group
10.0
10.0b
11.0
11.0f

If you need the items within the groups (and this in the final result for the last two groups) sorted by suffix, you can stick another Sort-Object directly after the Get-Content.

edited Apr 29, 2012 at 12:04

answered Apr 29, 2012 at 10:39

Joey

357k88 gold badges704 silver badges699 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kokotas Over a year ago

Thank you. This helps a lot. I have a problem using group command. See the result: 1 6.0b1 {6.0b1} . It won't remove the not digit character, thus not group them correctly. Is there a way to remove everything after the full stop. Not just non digits?

Joey Over a year ago

Yes, see edit. I'm sorry, I got the impression from your test data that there was a floating-point number optionally followed by a letter and wrote my regexes accordingly.

Richard · Accepted Answer · 2012-04-29 10:08:13Z

0

You can pass an expression to Sort-Object, the sort will then use that expression to sort the objects. This is done by passing a hash table with key expression (can be abbreviated to e). To reverse the order add a second key descending (or d) with value $true.

In your case

...input... | Sort @{e={convert $_ as required}}

Multiple property names and hash tables can be supplied: so 11.0f could be split into a number and suffix.

If there is a lot of overlap between the sort expressions you could pre-process the input into objects with the sort properties first (and remove after):

...input... | %{ 
  if ($_ -match '^(\d+\.0)(.)?') {
    new-object PSObject -prop @{value=$_; a=[double]::Parse($matches[1]); b=$matches[2] }
  } else {
    new-object PSObject -prop @{value=$_; a=[double]::MinValue; b=$null }
  }
} | sort a,b | select -expand value

answered Apr 29, 2012 at 10:08

Richard

110k21 gold badges214 silver badges279 bronze badges

3 Comments

Joey Over a year ago

You don't need a hashtable for Sort-Object, a scriptblock suffices. And generally I think the object solution is overkill if you don't need to retain the objects for some reason or another.

Richard Over a year ago

@Joey True re. hash table. In this case because there is a single (maybe failing) regex I think performing the transform once is simpler (consider two script blocks passed to sort both doing if ($_ -match ...).

Joey Over a year ago

Yes, in that case it would make sense. By the way, I find switch to be a quite awesome way of writing a quick text file parser which may convert lines or a bunch of lines to objects :-)

Collectives™ on Stack Overflow

Powershell v2, getting specific lines from file, sorting

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related