2

I have a csv file that looks like this:

Date     Name    Wage
5/1/19   Joe     $100
5/1/19   Sam     $120
5/1/19   Kate    $30
5/2/19   Joe     $120
5/2/19   Sam     $134
5/2/19   Kate    $56
5/3/19   Joe     $89
5/3/19   Sam     $90
5/3/19   Kate    $231

I would like to restructure it to look like this:

Date      Joe    Sam    Kate
5/1/19    $100   $120   $30
5/2/19    $120   $134   $56
5/3/19    $89    $90    $231

I am not sure how to approach it. Here is what I started writing:

import csv

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
  rows = list(csv.DictReader(filein, skipinitialspace=True))
  names = NOT SURE HOW TO GET THIS
  fieldnames = ['Date'] + ['{}'.format(i) for i in names]
  csvout = csv.DictWriter(fileout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
  csvout.writeheader()
  for row in rows:
    row['{}'.format(row['Name'].strip())] = row['Wage']
    csvout.writerow(row)
8
  • The csv modules is just a parser that yields the CSV rows as tuples or dicts. It does not transform by itself the rows into something else. Commented Jun 5, 2019 at 15:31
  • 1
    it would be easier to use pandas in this case Commented Jun 5, 2019 at 15:32
  • Thank you. Would you mind pointing me at pandas example that does something similar? Commented Jun 5, 2019 at 15:33
  • @manticora This video could help you: youtube.com/watch?v=dcqPhpY7tWk Commented Jun 5, 2019 at 15:42
  • What is the separator? Does list(csv.DictReader(filein, skipinitialspace=True)) return what you expect? Commented Jun 5, 2019 at 15:45

4 Answers 4

2

It can be done with the csv module. Here is the way for Python 3:

import csv
import collections

with open ('myfile.csv', 'r') as filein, open ('restructured.csv', 'w', newline='') as fileout:
    data = collections.defaultdict(dict)
    names = set()
    for row in csv.DictReader(filein, skipinitialspace=True):
        data[row['Date']][row['Name']] = row['Wage']
        names.add(row['Name'])
    csvout = csv.DictWriter(fileout, fieldnames = ['Date'] + list(names))
    csvout.writeheader()
    for dat in sorted(data.keys()):
        row = data[dat]
        row['Date'] = dat
        csvout.writerow(row)

The generated csv should look like:

Date,Kate,Joe,Sam
5/1/19,$30,$100,$120
5/2/19,$56,$120,$134
5/3/19,$231,$89,$90

It is the same for Python 2 except for the first line which should be:

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
Sign up to request clarification or add additional context in comments.

3 Comments

It did work for me - thank you very much! But the data is not sorted by date :( My first column looks like this: Date 5/1/19 5/2/19 5/19/19 5/29/19 5/24/19 5/27/19 5/21/19 5/9/19 I tried sorting it with python afterwords, but got the following error: ValueError: time data '5' does not match format '%m-%d-%y'
It can easily be sorted by date. See my edit at for dat in sorted(data.keys()):
I think it doesn't recognize it as date because this time it sorted it this way: 5/1/19, 5/10/19, 5/11/19 and so on
2

Simply with pandas library:

import pandas as pd

df = pd.read_csv("test.csv", sep="\s+")
p_table = pd.pivot_table(df, values='Wage', columns=['Name'], index='Date', 
                         aggfunc=lambda x:x)
p_table = p_table.reset_index()
p_table.columns.name = None

print(p_table)

The output:

     Date   Joe  Kate   Sam
0  5/1/19  $100   $30  $120
1  5/2/19  $120   $56  $134
2  5/3/19   $89  $231   $90

Reference links:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

1 Comment

I like your aggregating function here, I hadn't seen or thought of that before.
1

What you want to do is also known as converting from long to wide format. Using pandas you can easily do this by

import pandas as pd

df = pd.read_csv("myfile.csv", sep = ',')

# Restructure the dataframe
tdf = df.pivot(index = 'Date', columns = 'Name', values = 'Wage')

tdf.to_csv("restructured.csv", sep = ',')

print(tdf)
Name     Joe  Kate   Sam
Date                    
5/1/19  $100   $30  $120
5/2/19  $120   $56  $134
5/3/19   $89  $231   $90

Comments

0

This should get you on the right track

data.csv

5/1/19,Joe,$100
5/1/19,Sam,$120
5/1/19,Kate,$30
5/2/19,Joe,$120
5/2/19,Sam,$134
5/2/19,Kate,$56
5/3/19,Joe,$89
5/3/19,Sam,$90
5/3/19,Kate,$231
data = {}
people = set()
with open('data.csv', 'r') as f:
    for line in f.read().splitlines():
        values = line.split(',')

        if values[0] not in data:
            data[values[0]] = {}

        data[values[0]][values[1]] = values[2]
        people.add(values[1])

print('Date,' + ','.join([per for per in people]))
for date in data:
    print(f"{date},{','.join([data[date][per] for per in people])}"

output:

Date,Sam,Kate,Joe
5/1/19,$120,$30,$100
5/2/19,$134,$56,$120
5/3/19,$90,$231,$89

1 Comment

I think OP wants to save as a CSV file, not print the outputs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.