29

Can I modify a CSV file inline using Python's CSV library, or similar technique?

Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:

with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
    writer = csv.writer(csvOutput, delimiter=',', quotechar='"')

    with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
        reader = csv.reader(csvFile, delimiter=',', quotechar='"')

        for row in reader:
            row[0] = row[0].title()
            writer.writerow(row)

The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.

I've tried the follow, but this appends the new records to the end of the file instead of replacing them.

with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(csvFile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)
1
  • In general, no, you can't do that. You could read all of the data from the 'r' file and wrap it in a StringIO object. Then you can pass that to the csv reader, closing the file and re-opening for writing... Commented Apr 15, 2013 at 17:10

3 Answers 3

72

No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.

The canonical method is to write to a new, temporary file and move that into place over the old file you read from.

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)

with open(filename, 'r', newline='') as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(tempfile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

shutil.move(tempfile.name, filename)

I've made use of the tempfile and shutil libraries here to make the task easier.

Sign up to request clarification or add additional context in comments.

4 Comments

just curious, isnt shutil a lib to operate at a high level and move seems to be a recursive function, wouldnt that be an overkill for a simple file replace
@PirateApp no, this isn’t overkill. We are moving a file from the temp directory to replace the original. The temp directory could be on a separate file system so a simple rename could fail; shutil.move() will in that case fall back to a copy-and-delete operation.
rb gives me _csv.Error: iterator should return strings, not bytes...
@AlexFriedmann: this answer was written in a time when Python 2 was the major version. In Python 3, use open(filename, 'r', newline=''), and you need to adjust the arguments too, to NamedTemporaryFile('w+t', newline='', delete=False). I've updated the answer.
8

There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.

As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.

1 Comment

Is it possible to somehow stream read and selective write since the files are large? Also out of plain curiosity, how large a file is too large to read fully in memory? 10mb 100mb? if you take say 4GB RAM into account on a droplet from say, Digital ocean
2

If you just want to modify a csv file inline by using Python, you may just employ pandas:

import pandas as pd
df = pd.read_csv('yourfilename.csv')

# modify the "name" in row 1 as "Lebron James"   
df.loc[1, 'name'] = "Lebron James"

# save the file using the same name
df.to_csv("yourfilename.csv")  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.