Convert csv file with python

Question

I'm new into python, does somebody have an idea what would be a good approach? I could just script it, but it's probably faster to use a package.

I have this .csv file (gigabytes large):

name,   value,  time
A,   1, 10
B,   2, 10
C,   3, 10
C,   3, 10 (should ignore duplicates, or non complete (A,B,C) entries
A,   4, 12 (should be sorted by time, this entry should be at the end, after time==11)
B,   5, 12
C,   6, 12
B,   7, 11 (order of A,B,C might be different)
C,   8, 11
A,   9, 11

convert it to a new .csv file containing:

time,   A,  B,  C
10, 1,  2,  3
11, 9,  7,  8
12, 4,  5,  6

A good approach would be to research how you can parse CSV with python, and figure out an algorithm that will do what you want. Hope this helps! — N. Ivanov
– N. Ivanov, Commented Apr 10, 2018 at 13:10
Aside from filtering, the operation you're trying to do is converting long-form data to wide. — caw5cv
– caw5cv, Commented Apr 10, 2018 at 13:12

jezrael · Accepted Answer · 2018-04-10 13:11:28Z

6

I think need drop_duplicates with pivot:

df = df.drop_duplicates().pivot('time','name','value')
print (df)
name  A  B  C
time         
10    1  2  3
11    9  7  8
12    4  5  6

answered Apr 10, 2018 at 13:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sheldon Over a year ago

Thanks, very useful! I was stuck at just reading the csv file. Together with the addition of @divyang this solved my question.

Sheldon Over a year ago

Very useful command this pivot(), the documentation has a very similar example pandas.pydata.org/pandas-docs/stable/generated/…

jezrael Over a year ago

@Sheldon Glad can help!

Divyang Vashi · Accepted Answer · 2018-04-10 13:55:55Z

2

Since I can't comment I would like to add to @jezrael answer that you would also want to drop incomplete or NaN values. By using df.dropna

import numpy as np
import pandas as pd
A = 'a'
B = 'b'
C = 'c'
df = pd.DataFrame([[A,   1, 10],
                [B,   2, 10],
                [C,   3, 10],
                [C,   3, 10],
                [A,   4, 12],
                [B,   5, 12],
                [C,   6, 12],
                [B,   7, 11],
                [C,   8, 11],
                [A,   9, 11],
                [np.nan, 10, 0]], columns = ["name","value", "time"])
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df = df.pivot('time','name','value')
print(df)

answered Apr 10, 2018 at 13:55

Divyang Vashi

14310 bronze badges

Collectives™ on Stack Overflow

Convert csv file with python

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related