1

I have an input file which contains some start dates and if those dates are before a specific date 1995-01-01 (YYYY-MM-DD format) then replace the date with the minimum value e.g.

<StartDate>1970-12-23</StartDate> 

would be changed to

<StartDate>1995-01-01</StartDate>

<StartDate>1996-05-12</StartDate> is ok and would remain unchanged.

I was hoping to use regex replace but checking for the date range isn't working as expected. I was hoping to use something like this for the range check

\b(?:1900-01-(?:3[01]|2[1-31])|1995/01/01)\b
0

2 Answers 2

2

You can use a simple regex like '<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>' to match <StartDate>, 4 digits, -, 2 digits, -, 2 digits, and </StartDate>, and then use a callback method to parse the captured into group 1 date and use Martin's code there to compare dates. If the date is before the one defined, use the min date, else, use the one captured.

$callback = {
  param($match)
  $current = [DateTime]$match.Groups[1].Value
  $minimum = [DateTime]'1995-01-01'

  if ($minimum -gt $current)
  {
    '<StartDate>1995-01-01</StartDate>'
  }
  else {
    '<StartDate>' + $match.Groups[1].Value + '</StartDate>'
  }
}

$text = '<StartDate>1970-12-23</StartDate>'
$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
$rex.Replace($text, $callback)

enter image description here

To use it with Get-Content and Foreach-Object, you may define the $callback as above and use

$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
(Get-Content $path\$xml_in) | ForEach-Object {$rex.Replace($_, $callback)} | Set-Content $path\$outfile
Sign up to request clarification or add additional context in comments.

4 Comments

You still need to use the Regex object as -replace does not support a callback as the replacement. Try (Get-Content $infile) | ForEach-Object {$rex.Replace($_, $callback)} | Set-Content $outfile
I'm pulling this in a from a file with multiple lines so can I do it using a get-Content and Foreach-object (assuming that $_ is the variable for the current line) (Get-Content $path\$xml_out) | Foreach-Object {$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{\}<\StartDate>', $rex.Replace($_, $callback)} | set-content $path\$xml_out
You should not assign any $rex inside Foreach-Object. Assign $rex before it and only once, and use it as I showed above (I tested it in my Powershell in Win7, and it worked).
Thanks to all that helped especially Wiktor. $xml_in="sample2.xml" $outfile="output.xml" $callback = { param($match) $current = [DateTime]$match.Groups[1].Value $minimum = [DateTime]'1995-01-01' if ($minimum -gt $current) { '<StartDate>1995-01-01</StartDate>' } else { '<StartDate>' + $match.Groups[1].Value + '</StartDate>' } } $rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>' #$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{\})' (Get-Content $path\$xml_in) | ForEach-Object {$rex.Replace($_, $callback)} | Set-Content $path\$outfile
1

You don't have to use regex here. Just cast the dates to DateTime and compare them:

$currentDate = [DateTime]'1970-12-23'
$minDate = [DateTime]'1995-01-01'

if ($minDate -gt $currentDate)
{
    $currentDate = $minDate
}

1 Comment

It's not only about the date. The OP should not use regex for the XML parsing, either. Powershell has built-in XML capabilities, it isn't even especially hard to do this properly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.