2

I have the following dataframe.

   hour sensor_id hourly_count 
0     1       101          651
1     1       102           19
2     2       101          423
3     2       102           12
4     3       101          356
5     4       101           79
6     4       102           21
7     5       101          129
8     6       101          561

Notice that for sensor_id 102, there are no values for hour = 3. This is due to the fact that the sensors do not generate a separate row of data if the hourly_count is equal to zero. This means that sensor 102 should have hourly_counts = 0 at hour = 3, but this is just the way the original data was collected.

I would ideally wish for a code that fills in this gap. So it should understand that if there are 2 sensors, each sensor should have an hourly record, and if not, insert a row in the dataframe for that sensor for that hour and fill the hourly_count column at that row as 0.

   hour sensor_id hourly_count 
0     1       101          651
1     1       102           19
2     2       101          423
3     2       102           12
4     3       101          356
5     3       102            0
6     4       101           79
7     4       102           21
8     5       101          129
9     5       102            0
10    6       101          561
11    6       102            0

Any help is really appreciated.

3 Answers 3

2

Using DataFrame.reindex, you can explicitly define your index. This is useful if you are missing data from both sensors for a particular hour. You can also extend the hour beyond what you have. In the following example, it extends out to hour 8.

new_ix = pd.MultiIndex.from_product([range(1,9), [101, 102]], names=['hour', 'sensor_id'])
df_new = df.set_index(['hour', 'sensor_id'])
df_new.reindex(new_ix, fill_value=0).reset_index()

Output:

    hour  sensor_id  hourly_count
0      1        101           651
1      1        102            19
2      2        101           423
3      2        102            12
4      3        101           356
5      3        102             0
6      4        101            79
7      4        102            21
8      5        101           129
9      5        102             0
10     6        101           561
11     6        102             0
12     7        101             0
13     7        102             0
14     8        101             0
15     8        102             0
Sign up to request clarification or add additional context in comments.

Comments

1

Use pandas.DataFrame.pivot and then unstack with reset_index:

new_df = df.pivot('sensor_id','hour', 'hourly_count').fillna(0).unstack().reset_index()
print(new_df)

Output:

    hour  sensor_id      0
0      1        101  651.0
1      1        102   19.0
2      2        101  423.0
3      2        102   12.0
4      3        101  356.0
5      3        102    0.0
6      4        101   79.0
7      4        102   21.0
8      5        101  129.0
9      5        102    0.0
10     6        101  561.0
11     6        102    0.0

Comments

1

Assume missing is on sensor_id 2 only. One way is you just create a new df with all combination of all hours of sensor_id 1, and merge left this new df with original df to get hourly_count and fillna

a = df.hour.unique()
Idf1 = pd.MultiIndex.from_product([a, [101, 102]]).to_frame(index=False, name=['hour', 'sensor_id'])

Out[157]:
    hour  sensor_id
0      1        101
1      1        102
2      2        101
3      2        102
4      3        101
5      3        102
6      4        101
7      4        102
8      5        101
9      5        102
10     6        101
11     6        102

df1.merge(df, on=['hour','sensor_id'], how='left').fillna(0)

Out[161]:
    hour  sensor_id  hourly_count
0      1        101         651.0
1      1        102          19.0
2      2        101         423.0
3      2        102          12.0
4      3        101         356.0
5      3        102           0.0
6      4        101          79.0
7      4        102          21.0
8      5        101         129.0
9      5        102           0.0
10     6        101         561.0
11     6        102           0.0

Other way: using unstack with fill_value

df.set_index(['hour', 'sensor_id']).unstack(fill_value=0).stack().reset_index()

Out[171]:
    hour  sensor_id  hourly_count
0      1        101           651
1      1        102            19
2      2        101           423
3      2        102            12
4      3        101           356
5      3        102             0
6      4        101            79
7      4        102            21
8      5        101           129
9      5        102             0
10     6        101           561
11     6        102             0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.