python take columns from two csv files and combine them for a new csv file

Question

I was trying to extract columns from two csv files and put all the selected columns to build a new csv file. here are my original csv files:

file 1:

Date    Time    FromPool:1:Delta    ToPool:1:Delta  FromPool:2:Kentucky ToPool:2:Kentucky   FromPool:3:MISO ToPool:3:MISO   FromPool:4:MRO  ToPool:4:MRO    FromPool:5:NC-SC    ToPool:5:NC-SC  FromPool:6:NY   ToPool:6:NY FromPool:7:PJM  ToPool:7:PJM    FromPool:8:TVA  ToPool:8:TVA
20181231    1   0   0   0   0   0   0   0   0   0   0   0   1470.82 1470.82 0   0   0
20181231    2   0   0   0   0   0   0   0   0   0   0   0   1475.41 1475.41 0   0   0
20181231    3   0   0   0   0   0   0   0   0   0   0   0   1480    1480    0   0   0
20181231    4   0   27.968  0   0   27.968  0   0   0   0   0   0   1480    1480    0   0   0
20181231    5   0   96.0939 0   0   117.8839    0   0   21.79   0   0   0   1331.068    1331.068    0   0   0
20181231    6   0   134.389 0   0   358.959 0   0   224.57  0   176.872 0   1464.9179   1464.9179   0   176.872 0
20181231    7   0   291.438 30.664  0   680.182 0   0   388.744 0   1404.892    0   1437.115    1437.115    30.664  1404.892    0
20181231    8   0   89.73   0   188.531 2404.063    0   0   388.742 0   1651.703    0   1410.229    1410.229    1737.06 1651.703    0
20181231    9   0   69.205  0   5.173   1419.352    0   0   388.743 0   1229.549    0   1398.427    1398.427    956.231 1229.549    0
20181231    10  0   0   112.367 0   1146.827    0   0   388.744 0   499.606 0   1393.049    1393.049    870.45  499.606 0
20181231    11  0   0   175.866 0   658.502 0   0   388.743 0   595.023 0   1391.607    1391.607    445.625 595.023 0
20181231    12  0   0   253.185 0   388.743 0   0   388.743 0   0   0   1393.049    1393.049    253.185 0   0
20181231    13  33.122  0   331.169 0   388.743 33.122  0   388.743 0   0   0   1396.984    1396.984    331.169 0   0
20181231    14  138.976 0   428.169 0   388.743 138.976 0   388.743 0   0   0   1398.426    1398.426    428.169 0   0
20181231    15  138.513 0   519.169 0   602.173 138.513 0   388.744 0   0   0   1401.049    1401.049    732.598 0   0
20181231    16  236.296 0   601.169 0   388.743 236.296 0   388.743 0   0   0   1399.738    1399.738    601.169 0   0
20181231    17  232.315 0   608.169 0   351.52  232.315 0   351.52  0   0   0   1386.229    1386.229    608.169 0   0
20181231    18  151.122 0   520.651 0   0   257.159 0   0   0   22.9259 0   1361.311    1467.348    520.651 22.9259 0
20181231    19  455.448 0   404.21  0   0   455.448 0   0   0   709.279 0   943.671 943.671 404.21  709.279 0
20181231    20  365.492 0   381.21  0   0   503.266 0   0   0   1334.21 0   1355.392    1493.166    381.21  1334.21 0
20181231    21  257.002 0   298.71  0   225.526 257.002 0   225.526 0   1350.388    0   1376.656    1376.656    298.71  1350.388    0
20181231    22  332.8759    0   341.169 0   388.743 332.8759    0   388.743 0   779.539 0   1393.049    1393.049    341.169 779.539 0
20181231    23  0   12.976  0   0   97.5    0   0   84.524  0   0   0   1419.278    1419.278    0   0   0
20181231    24  0   0   0   0   0   0   0   0   0   0   0   1445.6389   1445.6389   0   0   0
20190101    1   0   0   0   0   0   0   0   0   0   0   0   1338.195    1338.195    0   0   0
20190101    2   0   0   0   0   0   0   0   0   0   0   0   1213.715    1213.715    0   0   0

file 2:

Date    Time    PJM_G($/MWH)    PJM_H($/MWH)
20181231    1   28.549  28.923
20181231    2   27.262  29.067
20181231    3   27.839  29.524
20181231    4   28.136  30.132
20181231    5   30.339  33.152
20181231    6   32.511  35.47
20181231    7   38.585  40.438
20181231    8   39.514  41.878
20181231    9   38.843  41.401
20181231    10  38.447  40.631
20181231    11  38.3    40.393
20181231    12  37.496  39.631
20181231    13  37.529  39.598
20181231    14  38.072  40.001
20181231    15  38.202  40.135
20181231    16  37.641  39.577
20181231    17  38.37   40.276
20181231    18  45.857  48.009
20181231    19  55.744  58.435
20181231    20  47.055  49.369
20181231    21  39.962  42.045
20181231    22  37.961  40.164
20181231    23  32.169  34.892
20181231    24  26.309  27.747
20190101    1   27.407  28.779
20190101    2   27.672  28.959

here is my code, I just don't understand why I cannot get the correct result I want.

import csv
processyear = 2019
f_r1 = open("Pool_to_Pool_Tariffs.csv")
f_r2 = open("PJM_LMP.csv")
f_w = open("Economic_interchange_process.csv","w")
f1 = csv.reader(f_r1)
f2 = csv.reader(f_r2)
next(f1)
next(f2)

for line1 in f1:
    for line2 in f2:
        if (line1[0].strip() == line2[0].strip()):
            if (line1[1].strip() == line2[1].strip()):
                if int(line2[0][:4]) == processyear:
                    f_w.write(line2[0]+','+line2[1]+','+line1[14]+','+line1[15]+','+line2[2]+','+line2[3]+'\n')
f_r1.close()
f_r2.close()
f_w.close()

hope you can help me.

What's wrong with your resulting file? By the way you may use csv module to writing file also — Alex Lisovoy
– Alex Lisovoy, Commented Sep 18, 2014 at 13:50

FrobberOfBits · Accepted Answer · 2014-09-18 13:55:42Z

1

You have several problems going on here:

The first is that your files aren't CSV, they're tab delimited. So these lines need to change:

f1 = csv.reader(f_r1, delimiter='\t')
f2 = csv.reader(f_r2, delimiter='\t')

Second, you can only read through those CSV files once, the reader object consumes the file and doesn't store the entire thing in an array. So this loop:

for line1 in f1:
    for line2 in f2:

What that does is read the first line of f1. It then consumes ALL of f2. When you loop through the second time on f1, f2 is already empty so you don't get the results you expect.

Here's a demonstration of how it all gets consumed:

>>> f1 = csv.reader(open("Pool_to_Pool_Tariffs.csv"), delimiter='\t')
>>> x = 0
>>> for line in f1:  x = x + 1
... 
>>> print x
27
>>> x = 0
>>> for line in f1: x = x + 1
... 
>>> x
0

Notice that I run through the for loop once, and my file has 27 lines. I reset x, run through the file again, and I have 0 lines. Why is that? It's because f1 is at the end of the file, and there is nothing else to read.

You should re-write this using a different approach. One option would be to read both files into a data structure like a dictionary and then compare dictionaries. The other approach would be to put both in a database, and then use SQL to join them. Essentially what you're trying to do is a relational join in python code. That's OK for very small simple files, but if you intend to reuse this or do it repeatedly, you're better off using a database to do this.

answered Sep 18, 2014 at 13:55

FrobberOfBits

18.2k5 gold badges60 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

student Over a year ago

Thanks for your tips, I have had it fixed!

FrobberOfBits Over a year ago

If this solves your problem, please mark the question answered.

student Over a year ago

thanks again, but do you know how to mark it as answered? I am really new to this...

FrobberOfBits Over a year ago

There should be a green check mark you can click, right near the voting buttons (top left of the answer)

Burhan Khalid · Accepted Answer · 2014-09-18 14:24:38Z

What your code is doing is filtering out all those lines where the process year is 2019 and only writing those lines to the output file.

If that's the case - then you can simplify your life a bit, by only reading those lines from the first file that you are interested in.

You have to read the entire file first, you cannot step through both of them (well you can, but you'll have to rewind the second file each time).

The sequence is the following:

Read the first file entirely, collecting only those rows that you are interested in (the ones with the target year).
Read the second file and then:
1. If the year from the second file matches any row from the first file,
2. Construct a new line with the following fields:
  1. 1st column from the second file's matched row
  2. 2nd column from the second file's matched row
  3. 15th column from the first file's matched row
  4. 16th column from the first file's matched row
  5. 3rd column from the second file's matched row
  6. 4th column from the second file's matched row
3. Write the new line to a separate file

Here is some code to implement that logic:

import csv
from collections import defaultdict

YEAR = '2019'

file_a = 'Pool_to_Pool_Tariffs.csv'
file_b = 'PJM_LMP.csv'
result = 'Economic_interchange_process.csv'

source_lines = defaultdict(list)

with open(file_a, 'r') as f:
   reader = csv.reader(f, delimiter='\t')
   next(reader) # skips the header
   for row in reader:
       if row[0][:4] == YEAR:
          source_lines[row[0]].append(row)

with open(file_b, 'r') as f, open(result, 'w') as o:
   reader = csv.reader(f, delimiter='\t')
   writer = csv.writer(o, delimiter=',')
   next(reader)
   for row in reader:
      matched_row = source_lines.get(row[0])
      if matched_row:
         # We found a year that matches
         result_row = (row[0], row[1], matched_row[14], matched_row[15], row[2], row[3]) 
         writer.writerow(result_row)

Collectives™ on Stack Overflow

python take columns from two csv files and combine them for a new csv file

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related