2

I am trying to parse a file and convert a string date to a object datetime in en-US culture, but I get the error ArgumentOutOfRangeException.

I am stuck on two errors:

  • a) Need the month but looks like I can't use 'a' UFormat. (I can discard the day on string format 'Wed','Sun','Sat' to make the script easier)
  • b) Need to replace "now" to " "
BARCODE     LOCATION    LIBRARY            STORAGEPOLICY                     RETAIN UNTILL DATE       
-------     --------    -------            -------------                     ------------------       
L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 
L40063L8    slot 1      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sun Mar  6 22:34:39 2022 
L40072L8    slot 5      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_ANNUAL  now                      
L40071L8    slot 6      DRP_TAPE_DRPLTO                                      now                      
L40070L8    slot 7      DRP_TAPE_DRPLTO                                      now                      
L40064L8    slot 8      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sat Mar 19 11:10:37 2022

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject]@{
        BARCODE  = $_.Substring(0,8).Trim()
        LOCATION = $_.Substring(12,8).Trim()
        LIBRARY = $_.Substring(24,15).Trim()
        STORAGEPOLICY = $_.Substring(44,33).Trim()
        RETAINUNTIL = [datetime]::ParseExact($_.Substring(78,25).Trim()), "a dd hh:mm:ss yyyy", [Globalization.CultureInfo]::CreateSpecificCulture('en-US'))
    }
}

$objects

Can anyone help me?

3
  • Is your source file literally how you have it displayed here? Looks like you are parsing a fixed width file which gets dangerous unless you can guarantee uniformity. That looks like it could have easily been generated by powershell as well. Do you have control over that source file? I know its not what you are asking but it would help make the problem easier to deal with. Commented Mar 24, 2021 at 11:09
  • 1
    @Matt - case in point: "L40065L8 IEPort1 DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK Wed Mar 31 10:13:07 2021 ".Substring(78) => ed Mar 31 10:13:07 2021, not Wed Mar 31 10:13:07 2021 :-) Commented Mar 24, 2021 at 11:09
  • @Matt Its a command output to a text file, I can 100% guarantee uniformity but almost. Probably its better to split using rn o similar, since I have little knowledge of powershell and code very little, I have opted for substring option. :( Commented Mar 24, 2021 at 11:26

2 Answers 2

3

As @Matt mentioned in the comments, the first part of your problem is the data format - you're relying on the exact column widths being correct when you're using Substring(78, 25), which in the case of your data looks to be incorrect...

PS> $line = "L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 "
PS> $line.Substring(78)
ed Mar 31 10:13:07 2021

gives ed Mar 31 10:13:07 2021 instead of what you're probably expecting which is Wed Mar 31 10:13:07 2021 .

If you can, it would be better to change your data format to e.g. csv or json so you can extract the fields more easily, but if you can't do that you could try to dynamically calculate the column widths - e.g.:

$columns = [regex]::Matches($lines[1], "-+").Index;
# 0
# 12
# 24
# 43
# 77

This basically finds the start position of each of the "------" heading underlines, and then you can do something like:

$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = [datetime]::ParseExact(
            $_.Substring($columns[4]).Trim(),
            "a dd hh:mm:ss yyyy",
            [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
        )
    }
}

Except now, we're getting this error:

Exception calling "ParseExact" with "3" argument(s): "String 'Wed Mar 31 10:13:07 2021' was not recognized as a valid DateTime."

which we can fix with:

[datetime]::ParseExact(
   "Wed Mar 31 10:13:07 2021",
   "ddd MMM dd HH:mm:ss yyyy",
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
)
# 31 March 2021 10:13:07

but you've also got this date format:

Sun Mar 6 22:34:39 2022

(two spaces when the day part is a single digit)

so we need to use this overload of ParseExact instead to allow both formats:

[datetime]::ParseExact(
   "Sun Mar  6 22:34:39 2022",
   [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
   "None"  
)

and then we need to allow for the literal string now, so your final code becomes:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$columns = [regex]::Matches($lines[1], "-+").Index;

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = if( $_.Substring($columns[4]).Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_.Substring($columns[4]).Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    }
}

$objects | ft

#BARCODE  LOCATION LIBRARY         STORAGEPOLICY                    RETAINUNTIL
#-------  -------- -------         -------------                    -----------
#L40065L8 IEPort1  DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK   31/03/2021 10:13:07
#L40063L8 slot 1   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  06/03/2022 22:34:39
#L40072L8 slot 5   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_ANNUAL
#L40071L8 slot 6   DRP_TAPE_DRPLTO
#L40070L8 slot 7   DRP_TAPE_DRPLTO
#L40064L8 slot 8   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  19/03/2022 11:10:37

Update

Inspired by mklement0's answer, it might be useful to have a generalised parser for your file format - this returns a set of pscustomobjects with properties that match the file headers:

function ConvertFrom-MyFormat
{

    param
    (
        [Parameter(Mandatory=$true)]
        [string[]] $Lines
    )

    # find the positions of the underscores so we can access each one's index and length
    $matches = [regex]::Matches($Lines[1], "-+");

    # extract the header names from the first line using the 
    # positions of the underscores in the second line as a cutting guide
    $headers = $matches | foreach-object {
        $Lines[0].Substring($_.Index, $_.Length);
    }

    # process the data lines and return a custom objects for each one.
    # (the property names will match the headers)
    $Lines | select-object -Skip 2 | foreach-object {
        $line = $_;
        $values = [ordered] @{};
        0..($matches.Count-2) | foreach-object {
            $values.Add($headers[$_], $line.Substring($matches[$_].Index, $matches[$_+1].Index - $matches[$_].Index));
        }
        $values.Add($headers[-1], $line.Substring($matches[-1].Index));
        new-object PSCustomObject -Property $values;
    }

}

and your main code then just becomes a case of cleaning up and restructuring the result of this function:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$objects = ConvertFrom-MyFormat -Lines $lines | foreach-object {
    return new-object PSCustomObject -Property ([ordered] @{
        BARCODE = $_.BARCODE.Trim()
        LOCATION = $_.LOCATION.Trim()
        LIBRARY = $_.LIBRARY.Trim()
        STORAGEPOLICY = $_.STORAGEPOLICY.Trim()
        RETAINUNTIL = if( $_."RETAIN UNTILL DATE".Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_."RETAIN UNTILL DATE".Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    })
}

$objects | ft;
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot @mclayton! It's better? RETAILUNTIL = if( $_.Substring($columns[4], $columns[4] - $columns[4]).Trim() -eq "now" ) { " " } else {[datetime]::ParseExact( $_.Substring($columns[4], $columns[5] - $columns[4]).Trim(), [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM d HH:mm:ss yyyy"), [Globalization.CultureInfo]::CreateSpecificCulture("en-US"), "None" ) }
Ok, just $columns[4], because $[columns[5]) dosent's exists
Yeah exactly - $_.Substring( $x , $y ) means "take $y characters, starting at $x", whereas $_.Substring( $x ) means "take from character $x to the end of the string"
Nicely done. Some ideas for simplification: [Globalization.CultureInfo]::CreateSpecificCulture("en-US") -> [cultureinfo] 'en-US', but you can also pass [cultureinfo]::InvariantCulture, as it is based on English day and month names too. Instead of constructing two format strings, you can preprocess the input string by simply replacing runs of multiple spaces with one space each: [datetime]::ParseExact(($_.Substring($columns[4]).Trim() -replace ' +', ' '), 'ddd MMM d HH:mm:ss yyyy', [cultureinfo]::InvariantCulture)
Thanks again, love this part "$columns = [regex]::Matches($lines[1], "-+").Index;" to match columns
3

mclayton's helpful answer provides good explanations and an effective solution.

Let me complement it with an approach that:

  • generically parses fixed-column-width input files
  • assuming that the column widths can reliably be inferred from the separator line (the 2nd line), such that each substring between adjacent column separators such as ------- indicates a column.

Note: The code requires PowerShell (Core) 7, but could be adapted to work in Windows PowerShell too.

$sepChar = '-' # The char. used on the separator line to indicate column spans.
Get-Content c:\temp\qmedia.txt | ForEach-Object {
  $line = $_
  switch ($_.ReadCount) {
    1 { 
      # Header line: save for later analysis
      $headerLine = $line
      break
    } 
    2 { 
      # Separator line: it is the only reliable indicator of column width.
      # Construct a regex that captures the column values.
      # With the sample input's separator line, the resulting regex is:
      #     (.{12})(.{12})(.{19})(.{34})(.*)
      # Note: Syntax requires PowerShell 7
      $reCaptureColumns = 
        $line -replace ('{0}+[^{0}]*' -f [regex]::Escape($sepChar)), 
                       { 
                         if ($_.Index + $_.Value.Length -lt $line.Length) { "(.{$($_.Value.Length)})" }
                         else { '(.*)'}
                       }
      # Break the header line into column names.
      if ($headerLine -notmatch $reCaptureColumns) { Throw "Unexpected header line format: $headerLine" }
      # Save the array of column names.
      $columnNames = $Matches[1..($Matches.Count - 1)].Trim()
      break
    }
    default {
      # Data line:
      if ($line -notmatch $reCaptureColumns) { Throw "Unexpected line format: $line" }
      # Construct an ordered hashtable from the column values.
      $oht = [ordered] @{ }
      foreach ($ndx in 1..$columnNames.Count) {
        $oht[$columnNames[$ndx-1]] = $Matches[$ndx].Trim()
      }
      [pscustomobject] $oht # Convert to [pscustomobject] and output.
    }
  }
}

The above outputs a stream of [pscustomobject] instances, which allow for robust, convenient further processing, such as the date parsing you require, as shown in mclayton's answer (of course, you could integrate this processing directly into the code above, but I wanted to show the fixed-width parsing solution alone).

3 Comments

Inspired by this, I've shamelessly added a generalised ConvertFrom-MyFormat function at the end of my answer :-)
@mclayton: Nice, though for more versatility I'd make the separator char. configurable, and I wouldn't assume that each column name has the exact same length as its corresponding run of separator characters. Otherwise, it's good that you're not performing regex matching for each line the way my solution does, for improved performance - though for the sake of performance I'd avoid use of the pipeline inside the function ($lines[2..($lines.Count-1)].ForEach({...})). The post-processing code to handle date parsing doesn't need to recreate the objects, it could just update the property in place
@mklement0 - I was just aiming a bit lower than you for a generalised ConvertFrom-QMediaFormat (based on OP's filename) where yours looks like a more flexible ConvertFrom-FixedWidth :-), but I'll have a play with your pipeline suggestion. Re creating new objects, the OP's date property has a different name to the one in file header, so I built new objects to match their original output, but point taken about updating objects being more performant. M.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.