0

the data frame looks like this. I have tried with pivot, stack, unstack. Is there any method to achieve the output

   key  attribute   text_value  numeric_value   date_value
0   1   order       NaN         NaN             10/02/19
1   1   size        NaN         43.0            NaN
2   1   weight      NaN         22.0            NaN
3   1   price       NaN         33.0            NaN
4   1   segment     product     NaN             NaN
5   2   order       NaN         NaN             11/02/19
6   2   size        NaN         34.0            NaN
7   2   weight      NaN         32.0            NaN
8   2   price       NaN         89.0            NaN
9   2   segment     customer    NaN             NaN


I need the following output

key  order      size    weight    price    segment
1    10/2/2019  43.0    22.0      33.0     product
2    11/2/2019  34.0    32.0      89.0     customer

Thanks in advance

2 Answers 2

1

I believe you dont want to change dtypes in output data, so possible solution is processing each column separately by DataFrame.dropna and DataFrame.pivot and then join together by concat:

df['date_value'] = pd.to_datetime(df['date_value'])

df1 = df.dropna(subset=['text_value']).pivot('key','attribute','text_value')
df2 = df.dropna(subset=['numeric_value']).pivot('key','attribute','numeric_value')
df3 = df.dropna(subset=['date_value']).pivot('key','attribute','date_value')

df = pd.concat([df1, df2, df3], axis=1).reindex(df['attribute'].unique(), axis=1)
print (df)
attribute      order  size  weight  price   segment
key                                                
1         2019-10-02  43.0    22.0   33.0   product
2         2019-11-02  34.0    32.0   89.0  customer

print (df.dtypes)
order      datetime64[ns]
size              float64
weight            float64
price             float64
segment            object
dtype: object

Old answer - all values are casted to strings:

df['date_value'] = pd.to_datetime(df['date_value'])

df['text_value'] = df['text_value'].fillna(df['numeric_value']).fillna(df['date_value'])

df = df.pivot('key','attribute','text_value')
print (df)
attribute                order price   segment size weight
key                                                       
1          1569974400000000000    33   product   43     22
2          1572652800000000000    89  customer   34     32

print (df.dtypes)
order      object
price      object
segment    object
size       object
weight     object
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

0

This is the solution, I figured out

attr_dict = {'order':'date_value', 'size':'numeric_value', 'weight':'numeric_value', 'price':'numeric_value', 'segment':'text_value'}

output_table = pd.DataFrame()
for attr in attr_dict.keys():
    temp = input_table[input_table['attribute'] == attr][['key', attr_dict[attr]]]
    temp.rename(columns={attr_dict[attr]:attr}, inplace=True)
    output_table[attr] = list(temp.values[:,1])
output_table

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.