Python script to turn input csv columns into output csv row values

Question

I have an input csv that look like

email,trait1,trait2,trait3
foo@gmail,biz,baz,buzz
bar@gmail,bizzy,bazzy,buzzy
foobars@gmail,bizziest,bazziest,buzziest

and I need the output format to look like

Indv,AttrName,AttrValue,Start,End
foo@gmail,"trait1",biz,,,
foo@gmail,"trait2",baz,baz,,
foo@gmail,"trait3",buzz,,,

For each row in my input file I need to write a row for the N-1 columns in the input csv. The Start and End fields in the output file can be empty in some cases.

I'm trying to read in the data using a DictReader. So for i've been able to read in the data with

import unicodecsv
import os
import codecs

with open('test.csv') as csvfile:
    reader = unicodecsv.csv.DictReader(csvfile)
    outfile = codecs.open("test-write", "w", "utf-8")
    outfile.write("Indv", "ATTR", "Value", "Start","End\n")
    for row in reader:
        outfile.write([row['email'],"trait1",row['trait1'],'',''])
        outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
        outfile.write([row['email'],"trait3",row['trait3'],'','')

Which doesn't work. (I think I need to cast the list to a string), and is also very brittle as I'm hardcoding the column names for each row. The bigger issue is that the data within the for loop isn't written to "test-write". Only the line outfile.write("Indv", "ATTR", "Value", "Start","End\n") actually write out to the file. Is DictReader the appropriate class to use in my case?

I'm not great with unicode, but wouldn't codecs.open('test-write', 'w', 'utf-8') be identical to open('test-write', 'uw')? Similarly can't you open test.csv as ur and use the normal csv module? Maybe I'm oversimplifying though — Adam Smith
– Adam Smith, Commented Jun 2, 2015 at 22:19
Not sure if I understand what the Start and End columns refer to. But Pandas may have a solution for you. I get so far without further understanding what you want: import pandas as pd pd1 = pd.read_csv('input_csv.csv').stack() This gets a similar looking form, which after filling out what the Start and End mean, can be written to csv using pd1.to_csv(). — vk1011
– vk1011, Commented Jun 2, 2015 at 22:35

martineau · Accepted Answer · 2015-06-03 23:31:28Z

This uses a unicodecsv.DictWriter and the zip() function to do what you want, and the code is fairly readable in my opinion.

import unicodecsv
import os
import codecs

with open('test.csv') as infile, \
     codecs.open('test-write.csv', 'w', 'utf-8') as outfile:

    reader = unicodecsv.DictReader(infile)
    fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
    writer = unicodecsv.DictWriter(outfile, fieldnames)
    writer.writeheader()
    for row in reader:
        email = row['email']
        trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
        writer.writerows([  # writes three rows of output from each row of input
            dict(zip(fieldnames, [email, 'trait1', trait1])),
            dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
            dict(zip(fieldnames, [email, 'trait3', trait3]))])

Here's the contents of the test-write.csv file it produced from your example input csv file:

Indv,AttrName,AttrValue,Start,End
foo@gmail,trait1,biz,,
foo@gmail,trait2,baz,baz,
foo@gmail,trait3,buzz,,
bar@gmail,trait1,bizzy,,
bar@gmail,trait2,bazzy,bazzy,
bar@gmail,trait3,buzzy,,
foobars@gmail,trait1,bizziest,,
foobars@gmail,trait2,bazziest,bazziest,
foobars@gmail,trait3,buzziest,,

Adam Smith · Accepted Answer · 2015-06-02 22:36:55Z

2

I may be completely off since I don't do a lot of work with unicode, but it seems to me that the following should work:

import csv

with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
    reader = csv.DictReader(csvin)
    writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName', 
                                                'AttrValue', 'Start', 'End'])
    for row in reader:
        for traitnum in range(1, 4):
            key = "trait{}".format(traitnum)
            writer.writerow({'Indv': row['email'], 'AttrName': key,
                             'AttrValue': row[key]})

answered Jun 2, 2015 at 22:36

Adam Smith

54.6k13 gold badges84 silver badges120 bronze badges

Comments

vk1011 · Accepted Answer · 2015-06-02 23:39:53Z

1

import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)

Unclear on what the Start and End fields represent, but this gets you everything else.

answered Jun 2, 2015 at 23:39

vk1011

7,2297 gold badges29 silver badges42 bronze badges

Collectives™ on Stack Overflow

Python script to turn input csv columns into output csv row values

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related