0

I have a very awkward dataframe that looks like this:

+----+------+-------+-------+--------+----+--------+
|    |      | hour1 | hour2 | hour 3 | …  | hour24 |
+----+------+-------+-------+--------+----+--------+
| id | date |       |       |        |    |        |
| 1  | 3    |     4 |     0 |     96 | 88 |     35 |
|    | 4    |    10 |     2 |     54 | 42 |     37 |
|    | 5    |     9 |    32 |      8 | 70 |     34 |
|    | 6    |    36 |    89 |     69 | 46 |     78 |
| 2  | 5    |    17 |    41 |     48 | 45 |     71 |
|    | 6    |    50 |    66 |     82 | 72 |     59 |
|    | 7    |    14 |    24 |     55 | 20 |     89 |
|    | 8    |    76 |    36 |     13 | 14 |     21 |
| 3  | 5    |    97 |    19 |     41 | 61 |     72 |
|    | 6    |    22 |     4 |     56 | 82 |     15 |
|    | 7    |    17 |    57 |     30 | 63 |     88 |
|    | 8    |    83 |    43 |     35 |  8 |      4 |
+----+------+-------+-------+--------+----+--------+

For each id there is a list of dates and for each date the hour columns represent that full day's worth of data broken out by hour for the full 24hrs.

What I would like to do is plot (using matplotlib) the full hourly data for each of the ids, but I can't think of a way to do this. I was looking into the possibility of creating numpy matrices, but I'm not sure if that is the right path to go down.

Clarification: Essentially, for each id I want to concatenate all the hourly data together in order and plot that. I already have the days in the proper order, so I imagine it's just a matter finding a way to put all of the hourly data for each id into one object

Any thoughts on how to best accomplish this?

Here is some sample data in csv format: http://www.sharecsv.com/s/e56364930ddb3d04dec6994904b05cc6/test1.csv

9
  • How do you want to plot that? Are you saying you want to plot each row of your DataFrame as a separate line, with all such lines combined in a single graph? Commented Jun 14, 2015 at 18:35
  • @BrenBarn Essentially, for each id I want to concatenate all the hourly data together in order and plot that. I already have the days in the proper order, so I imagine it's just a matter finding a way to put all of the hourly data for each id into one object Commented Jun 14, 2015 at 18:37
  • Again, please say what you mean by "plot that". Plot it how? Bar plot? Line plot? What does each bar/line represent? How are the bares/lines combined into a single graph, if at all? Do you mean that, e.g., for id=1 you would have a line with 96 points (because it has four dates with 24 points each)? Commented Jun 14, 2015 at 18:39
  • @BrenBarn I would like to make a line plot and yes, your example is correct. Each plot will have a number of points that is equal to the number of dates * 24 Commented Jun 14, 2015 at 18:41
  • Are you saying you want all these lines on the same graph, or a separate graph for each? What do you want the X axis to represent on such a plot? Commented Jun 14, 2015 at 18:42

3 Answers 3

2

Here is one approach:

for groupID, data in d.groupby(level='id'):
    fig = pyplot.figure()
    ax = fig.gca()
    ax.plot(data.values.ravel())
    ax.set_xticks(np.arange(len(data))*24)
    ax.set_xticklabels(data.index.get_level_values('date'))

ravel is a numpy method that will string out multiple rows into one long 1D array.

Beware running this interactively on a large dataset, as it creates a separate plot for each line. If you want to save the plots or the like, set a noninteractive matplotlib backend and use savefig to save each figure, then close it before creating the next one.

Sign up to request clarification or add additional context in comments.

Comments

2

It might also be of interest to stack the data frame so that you have the dates and times together in the same index. For example, doing

df = df.stack().unstack(0) 

Will put the dates and times in the index and the id as the columns names. Calling df.plot() will give you a line plot for each time series on the same axes. So you could do it as

ax = df.stack().unstack(0).plot()

and format the axes either by passing arguments to the plot method or by calling methods on ax.

1 Comment

You're welcome. I think it solves the 'awkward shaped data frame' problem
2

I am not totally happy with this solution but maybe it can serve as starting point. Since your data is cyclic, I chose a polar chart. Unfortunately, the resolution in the y direction is poor. Therefore, I zoomed manually into the plot:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

df = pd.read_csv('test1.csv')
df_new = df.set_index(['id','date'])
n = len(df_new.columns)

# convert from hours to rad
angle = np.linspace(0,2*np.pi,n)


# color palete to cycle through
n_data = len(df_new.T.columns)
color = plt.cm.Paired(np.linspace(0,1,n_data/2)) # divided by two since you have 'red', and 'blue'
from itertools import cycle
c_iter = cycle(color)

fig = plt.figure()
ax = fig.add_subplot(111, polar=True)

# looping through the columns and manually select one category
for ind, i in enumerate(df_new.T.columns):
    if i[0] == 'red':
        ax.plot(angle,df_new.T[i].values,color=c_iter.next(),label=i,linewidth=2)


# set the labels
ax.set_xticks(np.linspace(0, 2*np.pi, 24, endpoint=False))
ax.set_xticklabels(range(24))

# make the legend
ax.legend(loc='upper left', bbox_to_anchor = (1.2,1.1))
plt.show()

Zoom 0:

enter image description here

Zoom 1: enter image description here

Zoom 2: enter image description here

3 Comments

This is quite a bit different than what I was looking for, but this is still really, really, really awesome.
watch out if you copy paste the code, I just removed the y-log-scale
I am curious, what does this data represent ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.