8

I have written the following code to take a large csv file, and split it into multiple csv files based on a particular word in a column. The original csv file has some fields that are strings, and they have quotes around them.

For example:

Field1,Field2,Field3,Field4
1,2,"red",3
1,4,"red",4
3,4,"blue",4

etc.

My code splits the file into separate csvs based on Field4.

My output looks like this:

3.csv
Field1,Field2,Field3,Field4
1,2,red,3

4.csv
Field1,Field2,Field3,Field4
1,4,red,4
3,4,blue,4

I want my output to maintain quotes around the strings in field 3. The files are fed into a piece of software that only works if strings have quotes around them, which is quite annoying.

My current code looks like this:

import csv

#Creates empty set - this will be used to store the values that have already been used
newfilelist = set()

#Opens the large csv file in "read" mode
with open('File.csv', 'r') as csvfile:
    
    #Read the first row of the large file and store the whole row as a string (headerstring)
    read_rows = csv.reader(csvfile)
    headerrow = next(read_rows)
    headerstring=','.join(headerrow) 
    for row in read_rows:
        
        #Store the whole row as a string (rowstring)
        rowstring=','.join(row)
        
        #Takes Field 4
        newfilename = (row[3])
        
        
        #This basically makes sure it is not looking at the header row.
        if newfilename != "field4":
        
            
            #If the newfilename is not in the newfilename set, add it to the list and create new csv file with header row.
            if newfilename not in newfilelist:    
                newfilelist.add(newfilename)
                with open('//output/' +str(newfilename)+'.csv','a') as f:
                    f.write(headerstring)
                    f.write("\n")
                    f.close()    
            #If the newfilename is in the newfilelist set, append the current row to the existing csv file.     
            else:
                with open('//output/' +str(newfilename)+'.csv','a') as f:
                    f.write(rowstring)
                    f.write("\n")
                    f.close()
 

Can anybody advise me how to get the quotes around the strings? Unfortunately the software that uses my files requires them to be in this format!

1

2 Answers 2

17

Pass quoting=csv.QUOTE_NONNUMERIC to csv.writer().

Sign up to request clarification or add additional context in comments.

Comments

0

CSVwriter might be overkill for what you are trying to do. If you want the entire line unchanged, just write the entire line.

#Creates empty array - this will be used to store the values that have already been used
newfilelist = {}

#Opens the large csv file in "read" mode
with open('File.csv, 'r') as csvfile:

    #Read the first row of the large file and store the whole row as a string (headerstring)
    headerstring = csvfile.readline()
    for row in csvfile.readlines():

        #Takes Field 4
        newfilename = row.split(',')[3].strip('"')

        #If the newfilename is not in the newfilename set, add it to the list and create new csv file with header row.
        if newfilename not in newfilelist:    
            newfilelist[newfilename] = open('//output/' +str(newfilename)+'.csv','w'):  #open a file and store the file reference in an dictionary
            newfilelist[newfilename].write(headerstring)

        newfilelist[newfilename].write(row)  # Write out a row to an existing file

#Close all open files
for k in newfilelist.keys():
    newfilelist[k].close()

3 Comments

This code seems to break if file sizes are huge, but the initial code works. I get MemoryError at the "for row in csvfile.readlines():" line.
Not suprising if you your output logic generates a lot of files. If you know a point where you will no longer be writing to a file, you can close earlier instead of waiting to the end.
I guess to do this I'd have to sort the data? There is not necessarily any order and there could be tens of thousands of files. I think sorting the data would be tricky with the initial file sizes being multiple gigabytes!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.