0

I have an input csv that look like

email,trait1,trait2,trait3
foo@gmail,biz,baz,buzz
bar@gmail,bizzy,bazzy,buzzy
foobars@gmail,bizziest,bazziest,buzziest

and I need the output format to look like

Indv,AttrName,AttrValue,Start,End
foo@gmail,"trait1",biz,,,
foo@gmail,"trait2",baz,baz,,
foo@gmail,"trait3",buzz,,,

For each row in my input file I need to write a row for the N-1 columns in the input csv. The Start and End fields in the output file can be empty in some cases.

I'm trying to read in the data using a DictReader. So for i've been able to read in the data with

import unicodecsv
import os
import codecs

with open('test.csv') as csvfile:
    reader = unicodecsv.csv.DictReader(csvfile)
    outfile = codecs.open("test-write", "w", "utf-8")
    outfile.write("Indv", "ATTR", "Value", "Start","End\n")
    for row in reader:
        outfile.write([row['email'],"trait1",row['trait1'],'',''])
        outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
        outfile.write([row['email'],"trait3",row['trait3'],'','')

Which doesn't work. (I think I need to cast the list to a string), and is also very brittle as I'm hardcoding the column names for each row. The bigger issue is that the data within the for loop isn't written to "test-write". Only the line outfile.write("Indv", "ATTR", "Value", "Start","End\n") actually write out to the file. Is DictReader the appropriate class to use in my case?

2
  • I'm not great with unicode, but wouldn't codecs.open('test-write', 'w', 'utf-8') be identical to open('test-write', 'uw')? Similarly can't you open test.csv as ur and use the normal csv module? Maybe I'm oversimplifying though Commented Jun 2, 2015 at 22:19
  • Not sure if I understand what the Start and End columns refer to. But Pandas may have a solution for you. I get so far without further understanding what you want: import pandas as pd pd1 = pd.read_csv('input_csv.csv').stack() This gets a similar looking form, which after filling out what the Start and End mean, can be written to csv using pd1.to_csv(). Commented Jun 2, 2015 at 22:35

3 Answers 3

3

This uses a unicodecsv.DictWriter and the zip() function to do what you want, and the code is fairly readable in my opinion.

import unicodecsv
import os
import codecs

with open('test.csv') as infile, \
     codecs.open('test-write.csv', 'w', 'utf-8') as outfile:

    reader = unicodecsv.DictReader(infile)
    fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
    writer = unicodecsv.DictWriter(outfile, fieldnames)
    writer.writeheader()
    for row in reader:
        email = row['email']
        trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
        writer.writerows([  # writes three rows of output from each row of input
            dict(zip(fieldnames, [email, 'trait1', trait1])),
            dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
            dict(zip(fieldnames, [email, 'trait3', trait3]))])

Here's the contents of the test-write.csv file it produced from your example input csv file:

Indv,AttrName,AttrValue,Start,End
foo@gmail,trait1,biz,,
foo@gmail,trait2,baz,baz,
foo@gmail,trait3,buzz,,
bar@gmail,trait1,bizzy,,
bar@gmail,trait2,bazzy,bazzy,
bar@gmail,trait3,buzzy,,
foobars@gmail,trait1,bizziest,,
foobars@gmail,trait2,bazziest,bazziest,
foobars@gmail,trait3,buzziest,,
Sign up to request clarification or add additional context in comments.

Comments

2

I may be completely off since I don't do a lot of work with unicode, but it seems to me that the following should work:

import csv

with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
    reader = csv.DictReader(csvin)
    writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName', 
                                                'AttrValue', 'Start', 'End'])
    for row in reader:
        for traitnum in range(1, 4):
            key = "trait{}".format(traitnum)
            writer.writerow({'Indv': row['email'], 'AttrName': key,
                             'AttrValue': row[key]})

Comments

1
import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)

Unclear on what the Start and End fields represent, but this gets you everything else.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.