Inline CSV File Editing with Python

Question

Can I modify a CSV file inline using Python's CSV library, or similar technique?

Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:

with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
    writer = csv.writer(csvOutput, delimiter=',', quotechar='"')

    with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
        reader = csv.reader(csvFile, delimiter=',', quotechar='"')

        for row in reader:
            row[0] = row[0].title()
            writer.writerow(row)

The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.

I've tried the follow, but this appends the new records to the end of the file instead of replacing them.

with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(csvFile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

In general, no, you can't do that. You could read all of the data from the 'r' file and wrap it in a StringIO object. Then you can pass that to the csv reader, closing the file and re-opening for writing... — mgilson
– mgilson, Commented Apr 15, 2013 at 17:10

Martijn Pieters · Accepted Answer · 2020-06-23 12:28:27Z

72

No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.

The canonical method is to write to a new, temporary file and move that into place over the old file you read from.

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)

with open(filename, 'r', newline='') as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(tempfile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

shutil.move(tempfile.name, filename)

I've made use of the tempfile and shutil libraries here to make the task easier.

edited Jun 23, 2020 at 12:28

answered Apr 15, 2013 at 17:12

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

PirateApp Over a year ago

just curious, isnt shutil a lib to operate at a high level and move seems to be a recursive function, wouldnt that be an overkill for a simple file replace

Martijn Pieters Over a year ago

@PirateApp no, this isn’t overkill. We are moving a file from the temp directory to replace the original. The temp directory could be on a separate file system so a simple rename could fail; shutil.move() will in that case fall back to a copy-and-delete operation.

Alex Friedmann Over a year ago

rb gives me _csv.Error: iterator should return strings, not bytes...

Martijn Pieters Over a year ago

@AlexFriedmann: this answer was written in a time when Python 2 was the major version. In Python 3, use open(filename, 'r', newline=''), and you need to adjust the arguments too, to NamedTemporaryFile('w+t', newline='', delete=False). I've updated the answer.

tylerl · Accepted Answer · 2013-04-15 17:24:18Z

8

There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.

As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.

answered Apr 15, 2013 at 17:24

tylerl

31k14 gold badges85 silver badges115 bronze badges

1 Comment

PirateApp Over a year ago

Is it possible to somehow stream read and selective write since the files are large? Also out of plain curiosity, how large a file is too large to read fully in memory? 10mb 100mb? if you take say 4GB RAM into account on a droplet from say, Digital ocean

RealThingMayNotBeElegant · Accepted Answer · 2022-04-22 20:25:36Z

2

If you just want to modify a csv file inline by using Python, you may just employ pandas:

import pandas as pd
df = pd.read_csv('yourfilename.csv')

# modify the "name" in row 1 as "Lebron James"   
df.loc[1, 'name'] = "Lebron James"

# save the file using the same name
df.to_csv("yourfilename.csv")

answered Apr 22, 2022 at 20:25

RealThingMayNotBeElegant

1013 bronze badges

Collectives™ on Stack Overflow

Inline CSV File Editing with Python

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related