2

So I want to know how I could get content from a file and count the consecutive occurrences of a string within that file? So my file has the following strings:

1
1
1
0
0
0
0
1
1
1
0
1
1
0
0
0
1
0
1
1
1
0
0

Now the thing is I know next to nothing about powershell, but know bash, so if somebody understands both, this is my desired effect:

[me@myplace aaa8]$ cat fule1|uniq -c
      3 1
      4 0
      3 1
      1 0
      2 1
      3 0
      1 1
      1 0
      3 1
      2 0

And if it's possible, also add the powershell equivalent of sort -hr :D

[me@myplace aaa8]$ cat fule1|uniq -c|sort -hr
      4 0
      3 1
      3 1
      3 1
      3 0
      2 1
      2 0
      1 1
      1 0
      1 0

So basically what this does is it tells me that the file I had has the longest streak of 4 zeroes, etc.

Is there a way to do this with powershell?

2
  • Could be something like that: [regex]::Matches('aaaaaaaaaaaaabbbbbbbbccc', '(.)\1+').Groups | Where-Object { $_.Length -gt 1 } | Sort-Object -Unique -Property Value combined with [RegexOptions]::Multiline option for your task. Measure-Object command might be useful too. I'm not sure about your input data size and how fast regular expressions will work. Commented Mar 23, 2019 at 23:50
  • @Rabash: uniq -c doesn't exclude single instances, so your solution won't work. In general, future readers benefit most from full-fledged answers, not (half-)solutions in comments Commented Mar 24, 2019 at 3:32

1 Answer 1

1

PowerShell's equivalent to the uniq utility, the Get-Unique cmdlet, unfortunately has no equivalent to the former's -c option for prepending the number of consecutive duplicate lines (as of PowerShell v6.2).

Note: Enhancing Get-Unique to support a -c-like feature and other features offered by the uniq POSIX utility is the subject of this feature request on GitHub.

Therefore, you must roll your own solution:

function Get-UniqueWithCount {

  begin {
    $instanceCount = 1; $prevLine = $null
  }

  process {
    if ($_ -eq $prevLine) {
      ++$instanceCount
    } elseif ($null -ne $prevLine) {
      [pscustomobject] @{ InstanceCount = $instanceCount; Line = $prevLine }
      $instanceCount = 1
    }
    $prevLine = $_
  }

  end {
    [pscustomobject] @{ InstanceCount = $instanceCount; Line = $prevLine }
  }

}

The above function accepts input from the pipeline (object by object as $_ in the process { ... } block). It compares each object (line) to the previous one and, if they're equal, increments the instance count; once a different line is found, the previous line is output, along with its instance count, as an object with properties InstanceCount and Line. The end { ... } block outputs the final output object for the last block of identical consecutive lines. See about_Functions_Advanced.

Then invoke it as follows:

Get-Content fule | Get-UniqueWithCount

which yields:

InstanceCount Line
------------- ----
            3 1
            4 0
            3 1
            1 0
            2 1
            3 0
            1 1
            1 0
            3 1
            2 0

Since Get-UniqueWithCount conveniently outputs objects whose typed properties we can act on, the equivalent of sort -hr (sort by embedded numbers (-h), in descending (reverse) order (-r)) is easy:

Get-Content fule | Get-UniqueWithCount | Sort-Object -Descending InstanceCount

which yields:

InstanceCount Line
------------- ----
            4 0
            3 1
            3 1
            3 0
            3 1
            2 1
            2 0
            1 0
            1 1
            1 0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.