2

This is the new input file format. I need to automate the process of replacing the content of one column in a .csv file with the use of python. I can also open the .csv file using Notepad and replace the content of the column but the file is very huge and it is taking a long time.

Name                          ID                                                class  Num
"kanika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MBA","12"
"Ambika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MS","13"

In the above, I need to replace the content of ID column. ID column is very inconsistent as it has big spaces and symbols like(; , /) in the content.The new content in the ID column should be "input".

This Id column is enclosed with 2 double quotes and has some extra spaces as well. Whereas other columns have only 1 double quote.

Is there any way to do it in python?

2 Answers 2

14

You could use the csv module in Python to achieve this.

csv.reader will return each row as a list of strings. You could then use csv.writer to stream each row and modify the ID column at this point, this will create a new file though.

So:

import csv
reader = csv.reader(open('file.csv', 'rb'))
writer = csv.writer(open('outfile.csv','wb'))
for row in reader:
    writer.writerow([row[0], "input", row[2], row[3]])
Sign up to request clarification or add additional context in comments.

4 Comments

I just copied your code and changed the input file name. I got syntax error at 'wb'.
'_csv.reader' object is not subscriptable---this is the error
Make sure your final statement is writer.writerow([row[0], "input", row[2], row[3]]). See 'row' instead of 'reader'
Yes, thanks. It is working but it has removed the commas and "" in all the data. If the file contains some empty columns and enclosed with "" , then you probably miss everything.
4

Read the .csv line-by-line, split on ,, and replace the second column with "input". Write it out (to a different file) as you go:

f = open('mycsv.csv','rb')
fo = open('out.csv','wb')

# go through each line of the file
for line in f:
    bits = line.split(',')
    # change second column
    bits[1] = '"input"'
    # join it back together and write it out
    fo.write( ','.join(bits) )

f.close()
fo.close()

Then you can rename it to replace the original file if you like.

5 Comments

Safer to use the csv module. If there are ever commas between the "" you should not split on them
cheers, I didn't know about the csv module. Learn something new every day!
I'd still recommend @timc's csv answer, it just feels safer to use a custom-built csv-parser than homebrew code - especially when it's an inbuilt module that doesn't bloat your code at all!
Very true. But in my case, your code is working fine. That code changes the type of the file. The output file is no more .csv-comma separated.
Don't use the wb mode!! It will overwrite every row until the last. Isn't ab better?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.