0

I have data in the following form:

[('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.22.211', 'param_c', 2)]

What would be the best way to create a dataframe out of it which looks like this:

                 timestamp   param_a   param_b   param_C
0  06/03/2018 17.35.18.211       1.0       NaN       NaN
1  06/03/2018 17.35.19.211       NaN       1.0       NaN
2  06/03/2018 17.35.20.211       NaN       NaN       1.0
3  06/03/2018 17.35.21.211       2.0       NaN       NaN
4  06/03/2018 17.35.22.211       NaN       2.0       2.0
0

3 Answers 3

1

Use DataFrame contructor with pivot, rename_axis and reset_index:

arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.23.211', 'param_c', 2)]

df = pd.DataFrame(arr, columns=['timestamp','b','c'])
df = df.pivot('timestamp','b','c').rename_axis(None, axis=1).reset_index()
print (df)
                 timestamp  param_a  param_b  param_c
0  06/03/2018 17.35.18.211      1.0      NaN      NaN
1  06/03/2018 17.35.19.211      NaN      1.0      NaN
2  06/03/2018 17.35.20.211      NaN      NaN      1.0
3  06/03/2018 17.35.21.211      2.0      NaN      NaN
4  06/03/2018 17.35.22.211      NaN      2.0      NaN
5  06/03/2018 17.35.23.211      NaN      NaN      2.0

But if duplicates in first and second values, is necessary aggregation.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi, thanks for the answer. I've edited the question to try to make it more concrete about duplicates. So timestamp can be duplicated, but this works even in that case as expected. I'm not sure what other kind of duplication you had in mind. I thought that there might be a way to specify DataFrame constructor to avoid pivot, but this is fine as well. Thanks.
@Marko - I think arr = [('06/03/2018 17.35.18.211', 'param_a', 1), ('06/03/2018 17.35.19.211', 'param_a', 1), ('06/03/2018 17.35.20.211', 'param_c', 1), ('06/03/2018 17.35.21.211', 'param_a', 2), ('06/03/2018 17.35.22.211', 'param_b', 2), ('06/03/2018 17.35.23.211', 'param_c', 2)] - There are duplicates in first ans second row '06/03/2018 17.35.18.211', 'param_a' and pivot is not possible use, because error. ( '06/03/2018 17.35.18.211', 'param_a'), Then is possible use pivot_table
Aha, I get it, you mean in case index and column name are same in two rows. Ok, thanks, that's not the case at the moment. Btw, in the example in your comment above, it should be ('06/03/2018 17.35.18.211', 'param_a', 1), ('06/03/2018 17.35.18.211', 'param_a', 1), so same timestamp and same column name. That's what you had in mind, right?
@Marko - exactly. You are right, then is necessary use groupby + aggregate function + unstack or pivot_table
1

You can also try this. (Note that get_dummies can be slow)

arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.23.211', 'param_c', 2)]
df = pd.DataFrame(arr)
pd.concat([df[0], df[2].values[:,None] * df[1].str.get_dummies()], axis=1)

    0                   param_a param_b param_c
0   06/03/2018 17.35.18.211 1   0   0
1   06/03/2018 17.35.19.211 0   1   0
2   06/03/2018 17.35.20.211 0   0   1
3   06/03/2018 17.35.21.211 2   0   0
4   06/03/2018 17.35.22.211 0   2   0
5   06/03/2018 17.35.23.211 0   0   2

Or

v = df[1].str.get_dummies()
pd.concat([df[0], df[2].values[:,None] * v.where(v>0)], axis=1)


    0                   param_a param_b param_c
0   06/03/2018 17.35.18.211 1.0 NaN NaN
1   06/03/2018 17.35.19.211 NaN 1.0 NaN
2   06/03/2018 17.35.20.211 NaN NaN 1.0
3   06/03/2018 17.35.21.211 2.0 NaN NaN
4   06/03/2018 17.35.22.211 NaN 2.0 NaN
5   06/03/2018 17.35.23.211 NaN NaN 2.0

Comments

0

You are trying to create a dataframe that have 4 columns from 3 columned data. If you want 4 columns, you have to reformat the data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.