How do create new column in csv file using python by shifting one row

Question

I have CSV file like below. It is huge file with thousands of records.

input.csv

No;Val;Rec;CSR
0;10;1;1200
0;100;2;1300
0;100;3;1300
0;100;4;1400
0;10;5;1200
0;11;6;1200

I want to create output.csv file by adding new column "PSR" after 1st column "No". This column value depends on column "PSR" Value. For 1st row, "PSR" shall be zero. From next record on-wards, it depends on "CSR" value in previous row. If present and previous record CSR value is same, then "PSR" shall be zero. If not, PSR value shall have the previous CSR value. For exmple, Value of CSR in 2nd row is 1300 which is different to the value in 1st record ( it is 1200). So PSR value for 2nd row shall be 1200. Where in 2nd and 3rd row, CSR value is same. So PSR value for 3rd row shall be zero. So new value PSR depends on CSR value in present and previous field.

Output.csv

No;PCR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

My Approach:

Use csv.reader and iterate over the objects in a list. Copy 5th column to 2nd column in list. Shift it one row down.
Then check the values in 2nd and 5th column (PCR and CSR), if both values are same. Replace the PCR value with zero.

I have problem in getting 1st step coded. I am able to duplicate the column but not able to shift it. Also 2nd step is quite straightforward.

Also, I am not sure whether this approach is correct Any pointers/recommendation would be really helpful.

Note: I am not able to install Pandas on CentOS. So help without this module would be better.

My Code:

with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
        reader = csv.reader(input, delimiter = ';')
        writer = csv.writer(output, delimiter = ';')
        mylist = []                                        
        header = next(reader)                           
        mylist.append(header)
        for rec in reader:
                mylist.append(rec)                      
                rec.insert(1, rec[3])
                mylist.append(rec)
        writer.writerows(mylist)

Chris Seymour · Accepted Answer · 2014-07-20 00:31:49Z

If your open to non-python solutions then awk could be a good option:

awk 'NR==1{$2="PSR;"$2}NR>1{$2=($4==a?0";"$2:+a";"$2);a=$4}1' FS=';' OFS=';' file
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

Awk is distributed with pretty much all Linux distributions and was designed exactly for this kind of task. It will blaze through your file. Add a redirection to the end > output.csv to save the output in a file.

A simple python approach using the same logic:

#!/usr/bin/env python

last = "0"

with open('input.csv') as csv:
    print next(csv).strip().replace(';', ';PSR;', 1)
    for line in csv:
        field = line.strip().split(';')
        if field[3] == last: field.insert(1, "0")
        else: field.insert(1, last)
        last = field[4]
        print ';'.join(field)

Produces the same output:

$ python parse.py
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200

Again just redirect the output to save it:

$ python parse.py > output.csv

pgreen2 · Accepted Answer · 2014-07-19 22:05:46Z

0

Just code it as you explained it. Store the previous CSR and refer to it on the next loop through; just be sure to update it.

import csv
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
        reader = csv.reader(input, delimiter = ';')
        writer = csv.writer(output, delimiter = ';')
        mylist = []
        header = next(reader)
        mylist.append(header)
        mylist.insert(1,'PCR')
        prev_csr = 0
        for rec in reader:
                rec.insert(1,prev_csr)
                mylist.append(rec)
                prev_csr = rec[4]
        writer.writerows(mylist)

answered Jul 19, 2014 at 22:05

pgreen2

3,6713 gold badges37 silver badges59 bronze badges

Comments

inspectorG4dget · Accepted Answer · 2014-07-19 22:10:03Z

0

with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
    reader = csv.reader(input, delimiter = ';')
    writer = csv.writer(output, delimiter = ';')

    header = next(reader)
    header.insert(1, 'PCR')
    writer.writerow(header)

    prevRow = next(reader)
    prevRow.insert(1, '0')
    writer.writerow(prevRow)
    for row in reader:
        if prevRow[-1] == row[-1]:
            val = '0'
        else:
            val = prevRow[-1]
        row.insert(1,val)
        prevRow = row
        writer.writerow(row)

edited Jul 19, 2014 at 22:10

answered Jul 19, 2014 at 22:04

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

1 Comment

user3762807 Over a year ago

Thanks. This solution also working. But I am not able to mark multiple post as answers.

aruisdante · Accepted Answer · 2014-07-20 03:36:15Z

0

Or, even easier using the DictReader and DictWriter capabilities of csv:

input_header  = ['No','Val','Rec','CSR']
output_header = ['No','PCR','Val','Rec','CSR']

with open('input.csv', 'rb') as in_file, open('output.csv', 'wb') as out_file:
    in_reader, out_writer = DictReader(in_file, input_header, delemeter =';'), DictWriter(out_file, output_header, delemeter =';')
    in_reader.next()         # skip the header
    out_writer.writeheader() # place the output header
    last_csr = None
    for row in in_reader():
        current_csr = row['CSR']
        row['PCR']  = last_csr if current_csr != last_csr else 0
        last_csr    = current_csr
        out_writer.writerow(row)

edited Jul 20, 2014 at 3:36

answered Jul 19, 2014 at 22:16

aruisdante

9,1652 gold badges36 silver badges38 bronze badges

5 Comments

user3762807 Over a year ago

Thanks. I think DictReader is good choice. I will try to learn how to use this. But I get following error with your script.

with open('input.csv', 'rb') as in_file, open('output.csv', 'wb') as out_file:                                            ^ SyntaxError: invalid syntax

aruisdante Over a year ago

What version of Python do you have? this syntax was added in 2.7/3.1

user3762807 Over a year ago

Python 2.6.6. It fails also with python-3.4.1

aruisdante Over a year ago

What exactly is the arrow pointing at? SO obviously doesn't mainting the spacing in comments

user3762807 Over a year ago

No pbm. I got it working after correcting syntax to match with my PERL version.

Collectives™ on Stack Overflow

How do create new column in csv file using python by shifting one row

4 Answers 4

Comments

Comments

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related