0

I have some CSV files where I need to delete all lines that contain a date that is greater than a specified date. How would I do this in PowerShell?

BTW: here is the date format: 09/29/2011

Example: I would want to delete all lines that contain the date greater than 09/29/2011.

5
  • Do you have an example file or excerpt? Is the date always in the same column for all files or can it be anywhere in the line? Commented Mar 6, 2012 at 6:59
  • The date appears to always be in the second column. Here is an excerpt of one of the lines: 000329|09/30/2011|BLNDCOM|Items||||||||||||||||||||||1||1||||||||3|1||2|||||||||||1||||1||2|1||2||1|1|2|3|1|1|1|4|1|1|1||1|3|||||2|||1||||||||2||||||||||| Commented Mar 6, 2012 at 7:28
  • Actually, upon closer examination of the CSV file, it looks like the entire pipe-delimited line is shoved into the first column. Commented Mar 6, 2012 at 7:31
  • 1
    Edit your question please. Comments don't support multiple lines and the info is important to answer correctly anyway. Commented Mar 6, 2012 at 7:31
  • Joey, can you please elaborate here? Edit my question how? I'm still an SO noob so I'm learning as I go... ;-) Thx! Commented Mar 6, 2012 at 17:00

5 Answers 5

2
 foreach ($file in gci *.csv){
   (gc $file) |
     ? {[datetime]$_.split('|')[1] -lt '09/29/2011'
     } | set-content $file

 }

Assuming that's a pipe-delimited file.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, mjolinor! I like your nice-n-tidy clean code. Less is better. ;-) BTW: It worked perfectly!
+1 for using set-content and remenber to me that it replaces the content of a file!
@mjolinor - how would your code be modified to also split the CSV file up based on dates greater than or equal to the 'specified date' and into 1-week increments? Your code helps me create a CSV file with lines containing ALL dates prior to 09/29/2011, but now I need to create additional CSV files containing lines with dates that are >= specified date, and each CSV file needs to contain dates with a 1-week range. Thanks in advance!
How would you want those .csv files named?
Something like this: originalfilename_startdate_enddate.csv
1

I favored clarity over conciseness:

param (
    [parameter(Mandatory = $true)] [string] $csvFileName,
    [parameter(Mandatory = $true)] [datetime] $date
)

try
{
    $Error.Clear()

    if (!(Test-Path $csvFileName))
        { throw "Could not find file $csvFileName" }

    $newContent = Get-Content $csvFileName |    ?{
        ([regex]::matches($_, "[0-9]{2}/[0-9]{2}/[0-9]{4}") | %{[DateTime] $_.value -lt $date})
    } 

    $newContent | Set-Content $csvFileName
}

catch
{
    Write-Host "$($MyInvocation.InvocationName): $_"
}

1 Comment

Be careful with that axe, Eugene. This will match ALL dates in a line, anywhere in your excel file. The regexp must be tweaked if you just want to filter out one column.
1

Ok, it seems like there is only one thing that looks like a date in that line, anyway, so we can just filter for that:

Get-ChildItem *.csv | # adapt if necessary
  ForEach-Object {
    (Get-Content $_) | # the parentheses are important so the entire file is read at once
      Where-Object { # now we process the file line by line
        # find the date                       ↓ suppress the boolean output
        $_ -match '\|(\d{2}/\d{2}/\d{4})\|' | Out-Null

        # this only works if every line contains a date. Hopefully it does.
        $date = [DateTime]($Matches[1])

        # Finally the comparison we wanted in the first place
        # This is the condition for all lines that are *retained* (hence less than)
        $date -lt '09/29/2011'
      } | Out-File $_ # use -Encoding ASCII/UTF8/Unicode depending on your needs.
                      # Maybe ASCII is enough
  }

or shorter:

gci *.csv | % {
  (gc $_) |
    ? {
      $null = $_ -match '\|(\d{2}/\d{2}/\d{4})\|'
      [DateTime]$Matches[1] -lt '09/29/2011'
    } |
    Out-File $_
}

6 Comments

Thanks, Joey. I tried using the code above but it gives me these errors: Bad argument to operator '-match': parsing "\|(\d{2}/\d{2}/\d{4}\)|" - Not enough )'s.. + $null = $_ -match <<<< '\|(\d{2}/\d{2}/\d{4}\)|' + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : BadOperatorArgument Cannot convert null to type "System.DateTime". + [DateTime]$Matches[ <<<< 1] -lt '09/29/2011' + CategoryInfo : NotSpecified: (:) [], RuntimeException + FullyQualifiedErrorId : RuntimeException
Thanks Joey! Your modified code seems to have worked. When I ran it on my 2 MB CSV file it stripped out the 2749 lines as expected. However, what's weird is the CSV file is now larger (3.17MB). Not sure why that is. NOTE: I also ran mjolinor's code and got the same results (2749 lines were stripped out), but the CSV file is 1.85 MB, which is what I expect (smaller than before). I compared the CSV output from your code & mjolinor's code and the files are identical (except for size). Strange. :-)
UTF-16 takes two bytes per character. That's why I included the comment about the encoding. Perhaps Set-Content will preserve the file's encoding.
Set-content and add-content don't do any formatting. They don't have an -Encoding parameter, and can't change the original encoding.
mjolinor: The -Encoding parameter is dynamic and definitely present for the FileSystem provider. It even works as it should. But whereas Out-File defaults to UTF-16, Set-Content will in fact preserve the original file's encoding (which isn't documented, though).
|
0

You need to create a new cleaned csv file:

supposing this is you csv:

col1,date,col3
aaaaa,05/05/2010,rwer
bdfdfg,06/29/2011,reewr
dsfsdf,08/05/2012,dsfsd

do like this:

import-csv .\myoriginal.csv -delimiter '|' | ? { [datetime]$_.date -ge [datetime]"09/29/2011"} | Export-Csv -NoTypeInformation -Path .\mycleaned.csv -delimiter '|'

then you can delete original csv with

remove-item .\myoriginal.csv

5 Comments

This will delete all kinds of lines but likely not only the ones they want. Keep in mind that if the CSV has no type information all you get are strings. And you're comparing strings there. If they used a sane date format that'd be no problem but this way ... just stick a line with 10/02/2009 in there and watch ;-)
Hmm..running the above script removes everything from my CSV file, or at least the Export-Csv file contains no data. Here is an example string (line) in the CSV file: 000329|10/01/2011|BLNDCOM|Items
@Joey. Thank for this. I fixed in casting piped value and "date" to [datetime].
Thanks for the updated code, but I'm getting the following error: Import-Csv : Cannot process argument because the value of argument "name" is invalid. Change the value of the "name" argument and run the operation again.
It suffices to cast the left value, due to PowerShell's type coercion rules, though ;)
-1

I wrote a script for you morning that do delete every line has pattern you specify. You should run script such as:

myscruipt.sh YOURDATYE YOURCSVFILE

myscript.sh:

#!/bin/bash
    declare -a  num
    num=`egrep -n "$1" yahoo_ab.csv |awk 'BEGIN{FS=":";}{for (i=0 ; i<NF ; i++) print $1; } '`
    while true; do 
        for i in $num ; do 
            sed -i "$i d" $2 ;
        done;
        egrep $1 $2;
        if [ $? = 1 ]; then break; fi;
    done;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.