1

I have CSV file like below. It is huge file with thousands of records.

input.csv

No;Val;Rec;CSR
0;10;1;1200
0;100;2;1300
0;100;3;1300
0;100;4;1400
0;10;5;1200
0;11;6;1200

I want to create output.csv file by adding new column "PSR" after 1st column "No". This column value depends on column "PSR" Value. For 1st row, "PSR" shall be zero. From next record on-wards, it depends on "CSR" value in previous row. If present and previous record CSR value is same, then "PSR" shall be zero. If not, PSR value shall have the previous CSR value. For exmple, Value of CSR in 2nd row is 1300 which is different to the value in 1st record ( it is 1200). So PSR value for 2nd row shall be 1200. Where in 2nd and 3rd row, CSR value is same. So PSR value for 3rd row shall be zero. So new value PSR depends on CSR value in present and previous field.

Output.csv

No;PCR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

My Approach:

  1. Use csv.reader and iterate over the objects in a list. Copy 5th column to 2nd column in list. Shift it one row down.
  2. Then check the values in 2nd and 5th column (PCR and CSR), if both values are same. Replace the PCR value with zero.

I have problem in getting 1st step coded. I am able to duplicate the column but not able to shift it. Also 2nd step is quite straightforward.

Also, I am not sure whether this approach is correct Any pointers/recommendation would be really helpful.

Note: I am not able to install Pandas on CentOS. So help without this module would be better.

My Code:

with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
        reader = csv.reader(input, delimiter = ';')
        writer = csv.writer(output, delimiter = ';')
        mylist = []                                        
        header = next(reader)                           
        mylist.append(header)
        for rec in reader:
                mylist.append(rec)                      
                rec.insert(1, rec[3])
                mylist.append(rec)
        writer.writerows(mylist)

4 Answers 4

1

If your open to non-python solutions then awk could be a good option:

awk 'NR==1{$2="PSR;"$2}NR>1{$2=($4==a?0";"$2:+a";"$2);a=$4}1' FS=';' OFS=';' file
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

Awk is distributed with pretty much all Linux distributions and was designed exactly for this kind of task. It will blaze through your file. Add a redirection to the end > output.csv to save the output in a file.

A simple python approach using the same logic:

#!/usr/bin/env python

last = "0"

with open('input.csv') as csv:
    print next(csv).strip().replace(';', ';PSR;', 1)
    for line in csv:
        field = line.strip().split(';')
        if field[3] == last: field.insert(1, "0")
        else: field.insert(1, last)
        last = field[4]
        print ';'.join(field)

Produces the same output:

$ python parse.py
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

Again just redirect the output to save it:

$ python parse.py > output.csv 
Sign up to request clarification or add additional context in comments.

Comments

0

Just code it as you explained it. Store the previous CSR and refer to it on the next loop through; just be sure to update it.

import csv
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
        reader = csv.reader(input, delimiter = ';')
        writer = csv.writer(output, delimiter = ';')
        mylist = []
        header = next(reader)
        mylist.append(header)
        mylist.insert(1,'PCR')
        prev_csr = 0
        for rec in reader:
                rec.insert(1,prev_csr)
                mylist.append(rec)
                prev_csr = rec[4]
        writer.writerows(mylist)

Comments

0
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
    reader = csv.reader(input, delimiter = ';')
    writer = csv.writer(output, delimiter = ';')

    header = next(reader)
    header.insert(1, 'PCR')
    writer.writerow(header)

    prevRow = next(reader)
    prevRow.insert(1, '0')
    writer.writerow(prevRow)
    for row in reader:
        if prevRow[-1] == row[-1]:
            val = '0'
        else:
            val = prevRow[-1]
        row.insert(1,val)
        prevRow = row
        writer.writerow(row)

1 Comment

Thanks. This solution also working. But I am not able to mark multiple post as answers.
0

Or, even easier using the DictReader and DictWriter capabilities of csv:

input_header  = ['No','Val','Rec','CSR']
output_header = ['No','PCR','Val','Rec','CSR']

with open('input.csv', 'rb') as in_file, open('output.csv', 'wb') as out_file:
    in_reader, out_writer = DictReader(in_file, input_header, delemeter =';'), DictWriter(out_file, output_header, delemeter =';')
    in_reader.next()         # skip the header
    out_writer.writeheader() # place the output header
    last_csr = None
    for row in in_reader():
        current_csr = row['CSR']
        row['PCR']  = last_csr if current_csr != last_csr else 0
        last_csr    = current_csr
        out_writer.writerow(row)

5 Comments

Thanks. I think DictReader is good choice. I will try to learn how to use this. But I get following error with your script. with open('input.csv', 'rb') as in_file, open('output.csv', 'wb') as out_file: ^ SyntaxError: invalid syntax
What version of Python do you have? this syntax was added in 2.7/3.1
Python 2.6.6. It fails also with python-3.4.1
What exactly is the arrow pointing at? SO obviously doesn't mainting the spacing in comments
No pbm. I got it working after correcting syntax to match with my PERL version.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.