Replace data in csv file using python

Question

This is the new input file format. I need to automate the process of replacing the content of one column in a .csv file with the use of python. I can also open the .csv file using Notepad and replace the content of the column but the file is very huge and it is taking a long time.

Name                          ID                                                class  Num
"kanika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MBA","12"
"Ambika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MS","13"

In the above, I need to replace the content of ID column. ID column is very inconsistent as it has big spaces and symbols like(; , /) in the content.The new content in the ID column should be "input".

This Id column is enclosed with 2 double quotes and has some extra spaces as well. Whereas other columns have only 1 double quote.

Is there any way to do it in python?

timc · Accepted Answer · 2012-01-18 04:35:57Z

14

You could use the csv module in Python to achieve this.

csv.reader will return each row as a list of strings. You could then use csv.writer to stream each row and modify the ID column at this point, this will create a new file though.

So:

import csv
reader = csv.reader(open('file.csv', 'rb'))
writer = csv.writer(open('outfile.csv','wb'))
for row in reader:
    writer.writerow([row[0], "input", row[2], row[3]])

answered Jan 18, 2012 at 4:35

timc

2,19415 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kanika Over a year ago

I just copied your code and changed the input file name. I got syntax error at 'wb'.

kanika Over a year ago

'_csv.reader' object is not subscriptable---this is the error

timc Over a year ago

Make sure your final statement is writer.writerow([row[0], "input", row[2], row[3]]). See 'row' instead of 'reader'

kanika Over a year ago

Yes, thanks. It is working but it has removed the commas and "" in all the data. If the file contains some empty columns and enclosed with "" , then you probably miss everything.

mathematical.coffee · Accepted Answer · 2012-01-18 04:37:50Z

4

Read the .csv line-by-line, split on ,, and replace the second column with "input". Write it out (to a different file) as you go:

f = open('mycsv.csv','rb')
fo = open('out.csv','wb')

# go through each line of the file
for line in f:
    bits = line.split(',')
    # change second column
    bits[1] = '"input"'
    # join it back together and write it out
    fo.write( ','.join(bits) )

f.close()
fo.close()

Then you can rename it to replace the original file if you like.

answered Jan 18, 2012 at 4:37

mathematical.coffee

57.1k15 gold badges160 silver badges197 bronze badges

5 Comments

John La Rooy Over a year ago

Safer to use the csv module. If there are ever commas between the "" you should not split on them

mathematical.coffee Over a year ago

cheers, I didn't know about the csv module. Learn something new every day!

mathematical.coffee Over a year ago

I'd still recommend @timc's csv answer, it just feels safer to use a custom-built csv-parser than homebrew code - especially when it's an inbuilt module that doesn't bloat your code at all!

kanika Over a year ago

Very true. But in my case, your code is working fine. That code changes the type of the file. The output file is no more .csv-comma separated.

rassa45 Over a year ago

Don't use the wb mode!! It will overwrite every row until the last. Isn't ab better?

Collectives™ on Stack Overflow

Replace data in csv file using python

2 Answers 2

4 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related