0

I have a lot of csv file made by 3 columns like this:

fac simile of files: file_1, file_4, file_5, file_7, etc 
(All the same file name, != only the final numbers at the end. Them are not consecutive tho as in the 
example)


the inside

['357', '29384', '0.0031545741324921135']
['357', '29389', '0.0031545741324921135']
['357', '29526', '0.0368574903844921735']
['357', '35516', '0.0036775741324564665']
['357', '35551', '0.0023554341325646453']
['357', '35639', '0.0064467781324766535']
['357', '36238', '0.0067543874132467543']
['357', '37162', '0.0031545746577921135']

Let's name the 3 columns [a,b,c]. I'd like to sort them by c, so the last column. I have to read all the files and sort all the content ina huge one. I can use a pickle for example.

My first idea was:

import csv
from operator import itemgetter
fn = 1
# N as the max number in the really last file
while fn < N:
   newfile = open("file_{fn}.csv","r")
   reader = csv.reader(newfile)

   file = open("BigSortedFile.csv","w")

   for line in sorted(reader, key=itemgetter(2)):
   file.write(line)

   fn = fn +1
file.close()

#after the loop I think I have to sort again the BigSortedFile.

But it's not working because I need a string, not a line. How can I do the whole process?

1 Answer 1

1

To sort all lines you need to read them all into one datastructure, then write them again.

The csv module needs you to open files with newline="" to work properly. When you use a csv.reader to read, you can also use a csv.writer to write your data:

import csv
from operator import itemgetter

fn = 1  # first file has number 1 in filename
N = 42  # last numer in file-names is 42

data = []
while fn < N:
   with open("file_{fn}.csv", "r", newline="") as newfile:
       reader = csv.reader(newfile)
       data.extend(list(reader))

data.sort(key=itemgetter(2))

with open("BigSortedFile.csv", "w", newline="") as bf:
    writer = csv.writer(bf)
    writer.writerows(data)
Sign up to request clarification or add additional context in comments.

3 Comments

Ok thanks. Now I'm trying to see if this works even if it's taking time. I have also some GB of data, I really dunno if this will work for so much stuff
@Hugo you should have mentioned that - I highly doubt it will work - GBsounds as if it won't fit into memory. You would need to maybe parially sort stuff and you definitly should look into pandas or something alike to wrangle that much of data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.