2

Hello I'm looking for a effective way to delete second and last row from multiple csv files? I have around 5000 files in a directory. The code below will delete first line. If I use parameter -skip 2. It will skip first 2 rows, but I need to keep first row and delete second row and last row. I'm also not sure if get-content set-content is proper way to go for such a big number of files.

foreach ($file in gci *.csv ){
(gc $file) | Select-Object -Skip 1 | set-content $file
 }

3 Answers 3

3

Just a word on the performance. I used @TheMadTechnician method of getting the content and compared 3 different methods of writing the output. I used 100 1MB input files for each test. Below are the results:

Using Out-File to overwrite the contents took 1 minute 32 seconds.

dir *.txt | %{
    $content = gc $_.FullName
    $content | select -First 1 | Out-File $_.FullName -Force
    $content[2..($content.count -2)]|Out-File $_.FullName -Append
}

Using Set-Content to overwrite the contents took 37 seconds.

dir *.txt | %{
    $content = gc $_.FullName
    $output = @($content | select -First 1 )
    $output += $content[2..($content.count -2)]
    $output | Set-Content $_.FullName -Force
}

Using a StreamWriter to overwrite the contents took 31 seconds.

dir *.txt | %{     
    $content = gc $_.FullName
    $output = @($content | select -First 1 )
    $output += $content[2..($content.count -2)]
    $sw = New-Object System.IO.StreamWriter($_.FullName,$false)
    $output | %{$sw.WriteLine($_)}
    $sw.close()
}

You might want to look into these different approaches for your particular situation, but I have always found that Out-File is far slower than Set-Content or a StreamWriter.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your help, Stephen. The last code is really really fast. Looks like you are really good in PS. Could you please help with one more problem? Can a similar code by used to change date format (it is stored in first column of those files) from m/d/yyyy to mm/dd/yyyy or do i need to post another question?
I would post another question as its a different issue. Try looking at get-date, import-csv, export-csv, and the custom datetime format providers and see if you can figure it out first.
Good to know about Out-File being so slow. I had no idea, but will be sure to take this into consideration in the future! Excellent answer
Hello Stephen.I just posted that question I mentioned above.It would be great if you cold review my code and suggest how to improve it. Thank you. John stackoverflow.com/questions/28399452/…
1

You were close, I'd just pipe through a Where clause in your code to skip item 1 and item -1 in each file. Like this:

(gci *.csv )|ForEach{
    $file = $_
    $contents = gc $file
    $contents | Where{$_ -ne $contents[1] -and $_ -ne $contents[-1]} | out-file $file.fullname -force
}

I did the ForEach inline to make sure that the GCI finishes and isn't holding anything open when you try to do the Out-File

Edit: I just realized that my code has a potential flaw in that if you have duplicate lines to either line 2 or the last line those lines would be eliminated as well. I wrote this assuming you had something like the following that you wanted to clean up:

Col1,Col2,Col3,Col4
---- ---- ---- ----
Data,data,data,data
data,data,data,data
Log Created: 02/04/2015

Where you wanted to remove the --- line and the log created note at the end.

Edit2: A better solution would probably be to get the content of the file, outputting the first line, and then outputting lines 3 through the end (minus 1 line) and appending it to the same file. Something like:

(gci *.csv )|ForEach{
    $Path = $_.FullName
    $content = gc $Path
    $content|select -first 1|Out-File $Path -force
    $content[2..($content.count-2)]|Out-File $Path -Append
}

3 Comments

Similar error is here. "Cannot bind argument to parameter "Path" because it is null" . $contents = gc $file
Sorry, I forgot to re-assisgn $file after I changed the ForEach method. I've updated it now, and it should work.
Thank you it works. Only problem is that it is quite slow. For 5000 files it took more than 7 minutes to run.On google I've found something about IO.StreamReader to read a large number of files, but I'm not sure It it can by used in my case.
1

Maybe not the best approach but you could use -Index and calculate the rows you want.

foreach ($file in gci *.csv ){
    $data = gc $file
    $data | Select-Object -Index (,0 + (2..($data.Count - 2))) | set-content $file
}

Indexing, for arrays, starts at 0 so we take that one and skip record 1. After we just take the rest minus the last. -Index takes and integer array so we create a single element array of value 0 and append the indexes from 2 until then end of the file minus the last ( which is where -2 comes from since .Count starts at 1) .

4 Comments

He wants the first line, but not the second line or last line. He wants to eliminate records 1 and -1
I'm getting error here. "Cannot convert value "Select" to type "System.int32". "Input string was not in correct format"
Yeah... i made another error on that one by mixing my test code with yours.. If you want to have a look again... maybe it performs better
Thank you very much Your code is about 45% faster than TheMadTechnician's That really saves a lot of time when handling with a large number of files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.