2

I'm just starting out with Pandas and I'm trying to make a data file I have into something I can export and read. The CSV I have is in this form:

time    |   parameter   |   value
------------------------------------
1       |       a       |   21
2       |       a       |   21
3       |       a       |   21
1       |       b       |   19
2       |       b       |   19
3       |       b       |   19
1       |       c       |   17
2       |       c       |   17
3       |       c       |   17

I want to transform it in the following form:

time    |   a   |   b   |   c   
------------------------------------
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17  
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17  
1       |   21  |   19  |   17  
2       |   21  |   19  |   17  
3       |   21  |   19  |   17  

Of course my data have different values, but the example above should be sufficient. It's weather data, like temperature and wind speed, and each row has the timestamp of the measurement, the param name and the value.

I want to transform it into a single row with 3 columns (or more if there are more parameters) for each timestamp, where the column name is the param name.

I know that I have to group my data by the time column so I've done df.groupby('time')

However, I cannot figure out how to execute an apply method that will give me the results I want. Any hints are appreciated!

6
  • 3
    why not just do df.pivot(index='time', columns='parameter')['value'] Commented Dec 6, 2018 at 16:17
  • Thank you @Chris. It gives me the error: ValueError: Index contains duplicate entries, cannot reshape. Should I group by the time first? Commented Dec 6, 2018 at 16:22
  • 1
    then i am guessing your actual dataframe is different from your example: in your actual dataframe you probably have two or more rows with the sametime and parameter values. Is that correct? Commented Dec 6, 2018 at 16:29
  • It is a big dataset downloaded from satellite data, so maybe there are duplicates. Is there a quick way to figure it out? Commented Dec 6, 2018 at 16:37
  • 1
    Yes try df[df[['time', 'parameter']].duplicated(keep=False)] and see if anything is returned. This will show you duplicated rows for time and parameter Commented Dec 6, 2018 at 16:39

1 Answer 1

1

You can try using pivot table:

pd.pivot_table(df, index='time', columns='parameter', values='value')

parameter   a   b   c
time                 
1          21  19  17
2          21  19  17
3          21  19  17
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I was unaware that there was a single function for doing what I needed. pivot_table worked in my case since I had duplicate values in the dataset, where the normal pivot did not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.