Transform multiple CSV rows in Pandas into one

Question

I'm just starting out with Pandas and I'm trying to make a data file I have into something I can export and read. The CSV I have is in this form:

time    |   parameter   |   value
------------------------------------
1       |       a       |   21
2       |       a       |   21
3       |       a       |   21
1       |       b       |   19
2       |       b       |   19
3       |       b       |   19
1       |       c       |   17
2       |       c       |   17
3       |       c       |   17

I want to transform it in the following form:

time    |   a   |   b   |   c   
------------------------------------
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17  
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17  
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17

Of course my data have different values, but the example above should be sufficient. It's weather data, like temperature and wind speed, and each row has the timestamp of the measurement, the param name and the value.

I want to transform it into a single row with 3 columns (or more if there are more parameters) for each timestamp, where the column name is the param name.

I know that I have to group my data by the time column so I've done df.groupby('time')

However, I cannot figure out how to execute an apply method that will give me the results I want. Any hints are appreciated!

why not just do df.pivot(index='time', columns='parameter')['value'] — It_is_Chris
– It_is_Chris, Commented Dec 6, 2018 at 16:17
Thank you @Chris. It gives me the error: ValueError: Index contains duplicate entries, cannot reshape. Should I group by the time first? — Lucas P.
– Lucas P., Commented Dec 6, 2018 at 16:22
then i am guessing your actual dataframe is different from your example: in your actual dataframe you probably have two or more rows with the sametime and parameter values. Is that correct? — It_is_Chris
– It_is_Chris, Commented Dec 6, 2018 at 16:29
It is a big dataset downloaded from satellite data, so maybe there are duplicates. Is there a quick way to figure it out? — Lucas P.
– Lucas P., Commented Dec 6, 2018 at 16:37
Yes try df[df[['time', 'parameter']].duplicated(keep=False)] and see if anything is returned. This will show you duplicated rows for time and parameter — It_is_Chris
– It_is_Chris, Commented Dec 6, 2018 at 16:39

ayorgo · Accepted Answer · 2018-12-07 10:56:47Z

1

You can try using pivot table:

pd.pivot_table(df, index='time', columns='parameter', values='value')

parameter   a   b   c
time                 
1          21  19  17
2          21  19  17
3          21  19  17

edited Dec 7, 2018 at 10:56

answered Dec 6, 2018 at 16:56

ayorgo

3,9472 gold badges29 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lucas P. Over a year ago

Thank you! I was unaware that there was a single function for doing what I needed. pivot_table worked in my case since I had duplicate values in the dataset, where the normal pivot did not.

Collectives™ on Stack Overflow

Transform multiple CSV rows in Pandas into one

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related