5

I have this .csv file ...

    id,first_name,last_name,email,date,opt-in,unique_code
    1,Jimmy,Reyes,[email protected],12/29/2016,FALSE,ER45DH
    2,Doris,Wood,[email protected],04/22/2016,,MU34T3
    3,Steven,Miller,[email protected],07/31/2016,FALSE,G34FGH
    4,Earl,Parker,[email protected],01-08-17,FALSE,ASY67J
    5,Barbara,Cruz,[email protected],12/30/2016,FALSE,NHG67P

If the opt-in value is empty, its should print "0". The last value in csv should print first, and then all the name, value pairs in a specific format, like shown in the expected output file below.

My expected output

ER45DH<tab>"id"="1","first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
MU34T3<tab>"id"="2","first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,"opt-in"="0"
.......

My code so far ..

import csv

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = csv.reader(infile)
        for i,rows in enumerate(reader):
            if i == 0:
               header = rows 
            else:
                if rows[5] == '':
                   rows[5] = 0;
                pat = rows[0]+'\t'+'''"%s"="%%s",'''*(len(header)-2)+'''"%s"="%%s"‌​\n'''
                print pat
                f2.write(pat % tuple(header[1:]) % tuple(rows[1:]))
    f2.close()

This code produces this output

1   "first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE","unique_code"="ASD34R"‌​
2   "first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016","opt-in"="0","unique_code"="SDS56N"

As you can see column "id" is missing, and I want unque_code at first place.

I will really appreciate any help/ideas/pointers.

Thanks

11
  • Have you considered using pandas? It can create a dataframe from a csv file, and from there you can rearrange and fill in collumns as you wish Commented Mar 30, 2017 at 11:20
  • Nopes, but just googled it and found this en.wikipedia.org/wiki/PANDAS Commented Mar 30, 2017 at 11:25
  • haha, that might be useful too (who knows) but I think this: pandas.pydata.org/pandas-docs/stable/generated/… might serve you better Commented Mar 30, 2017 at 11:26
  • it's missing what output you get? Commented Mar 30, 2017 at 11:28
  • Yeah, found it, Thanks. Is it easy to learn ? I have this one off task, So not sure if its worth spending couple of hours to learn a new library ??? Commented Mar 30, 2017 at 11:28

2 Answers 2

5

You could just modify the way you enter your list in the file like this:

# -*- encoding: utf-8 -*-
import csv

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = list(csv.reader(infile))  # load the whole file as a list
        header = reader[0]  # the first line is your header
        for row in reader[1:]:  # content is all the other lines
            if row[5] == '':
                row[5] = 0
            line = row[-1]+'\t'  # adding the unique code
            for j, e in enumerate(row[:-2]):
                line += '"'+header[j]+'"="'+e+'",'  # adding elements in order
            f2.write(line[:-1]+'\n')  # writing line without last comma

I modified a little bit the way you get the header, in order to avoid an unnecessary test for all the lines.

If your file is really big and/or you don't want to load it entirely in memory, you could modify to:

...
reader = csv.reader(infile)  # no conversion to list
header = next(reader)  # get first line
for row in reader:  # continue to read one line per loop
    ...
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks @BusyAnt, but your solution produced this error TypeError: '_csv.reader' object is not subscriptable
Works fine on my computer... What Python version are you using? Make sure you convert the reader into a list, otherwise you can't index (and reverse) it
My Python Version is 3.6
Ok... given the print pat line, I thought you were using 2.x therefore proposed a 2.7 solution
Working now .... But your code is placing "id" at the end of each row. Also it is placing the last row at the top. I DO NOT want reverse the order. Please see the expected output above. Thanks
|
3

You should process separately the header line, and then correctly process each line. You code could become:

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = csv.reader(infile)
        header = next(reader)  # store the headers and advance reader pointer
        for rows in reader:
            if rows[5]=="": rows[5] = "0"  # special processing for 6th field
            # uses last field here
            pat = rows[-1]+'\t'+'''"%s"="%%s",'''*(len(header)-2)+'''"%s"="%%s"‌​\n'''
            # process everything except last field
            fd2.write((pat % tuple(header[:-1])) % tuple(rows[:-1]))

No need to load the whole file in memory...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.