0

I'm trying to analyze some data, and to do so I am creating a new CSV file by writing some rows which are composed from other CSV files. I've extracted the data from one of the files (oldfile1) so it's a list (with specific indices I'm using to append to the new file), but the other (oldfile2) I'm using for the base of the file, so I can directly add the rows from that file, as they need no filtering. The formula for a new line should be row from oldfile2 + row from oldfile1. first is intend to skip the comment line. However, this code currently creates a hilariously large output file (200MB)--I suspect that it is looping through multiple times per row, duplicating the written rows. However, I cannot immediately think of another way to ensure the rows from oldfile2 are looped through while not duplicating the written rows. I also cannot give much more detail on the output file as it crashes whenever I try to open it. Any help appreciated.

with open('newfile.csv','w+') as f:
        reader = csv.reader(open('oldfile2.csv'), delimiter=',')
        writer = csv.writer(f, delimiter=',')
        first = next(reader)
        for oldrow2 in reader:
                outline = [oldrow2 + oldfile1[i] for i in oldfile1_indices]
                writer.writerow(outline)```


2
  • 1
    it seems like work for zip(). Something similar to for oldrow1, oldrow2 in zip(odlfile1, oldfile2): write_new_line() Commented Apr 18, 2019 at 2:10
  • 1
    your mistake is [oldrow2 + oldfile1[i] for i in oldfile1_indices] it connects oldrow2 with every line in oldfile1 - instead one line it create many lines. Commented Apr 18, 2019 at 2:13

1 Answer 1

1

I can't test it but I think you need zip() to create pairs (oldrow2, i) and then create new row and save it

oldfile1 = list(csv.reader(open('oldfile1.csv'), delimiter=','))
oldfile1_indices = [...]

with open('newfile.csv','w+') as f:
    writer = csv.writer(f, delimiter=',')

    reader2 = csv.reader(open('oldfile2.csv'), delimiter=',')
    next(reader2)

    for oldrow2, i in zip(reader2, oldfile1_indices):
        outline = [oldrow2 + oldfile1[i]]
        writer.writerow(outline)
Sign up to request clarification or add additional context in comments.

1 Comment

Ah yes, I used zip in an earlier step. I think I assumed Python could implicitly account for the index because of the for loop, but I suppose not. Perfect solution, thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.