2

I have the dataframe. `

data = pd.DataFrame([['Benz', 'MinSpeed', 0, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'TopSpeed', 200, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'ChasisNum', 654121, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'Seats', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'AirBags', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'VehicleType', np.nan, 'Sedan', 'USA', '2022-08-12'],
                     ['Benz', 'Color', np.nan, 'Black','USA', '2022-08-12'],
                     ['Benz', 'InternetInside', np.nan, 'Yes','USA', '2022-08-12'],
                     
                     ['Ferrari', 'MinSpeed', 0, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'TopSpeed', 250, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'ChasisNum', 781121, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'Seats', 4, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'AirBags', 2, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'VehicleType', np.nan, 'SUV', 'France', '2022-12-25'],
                     ['Ferrari', 'Color', np.nan, 'Red','France', '2022-12-25'],
                     ['Ferrari', 'InternetInside', np.nan, 'No','France', '2022-12-25'],
                     ], 
                    columns= ['CarModel', 'Features', 'NumericalValues', 'CategoricalValues','Country', 'DeliveryDate'])

`

I am trying the pivot the data using the pivot function but getting repeated columns for "NumericalValues" and "CategoricalValues" values

Code: `

data.pivot(index='CarModel', columns='Features', values=['NumericalValues','CategoricalValues' ]).reset_index()

`

I need the expected output as: `

output_data = pd.DataFrame([['Benz', 0, 200, 654121, 5, 5, 'Sedan', 'Black', 'Yes', 'USA', '2022-08-12'],
                         ['Ferrari', 0, 250, 781121, 4, 2, 'SUV', 'Red', 'No', 'France', '2022-12-25']
                     ],
                    columns=['CarModel', 'MinSpeed', 'TopSpeed', 'ChasisNum','Seats', 'AirBags', 'VehicleType', 'Color', 'InternetInside', 'Country', 'DeliveryDate'])

` I tried with Pivot table as well but unable to get this output.

3 Answers 3

2

You can perform your pivot, then run groupby.first on the columns to get rid of the unwanted columns:

out = (data
  .pivot(index=['CarModel', 'Country', 'DeliveryDate'],
         columns='Features'
        )
  .groupby(level='Features', axis=1).first()
  .reset_index()
)

Output:

Features CarModel Country DeliveryDate  AirBags  ChasisNum  Color InternetInside  MinSpeed  Seats  TopSpeed VehicleType
0            Benz     USA   2022-08-12      5.0   654121.0  Black            Yes       0.0    5.0     200.0       Sedan
1         Ferrari  France   2022-12-25      2.0   781121.0    Red             No       0.0    4.0     250.0         SUV

The advantage is that it maintains the dtypes:

Features
CarModel           object
Country            object
DeliveryDate       object
AirBags           float64
ChasisNum         float64
Color              object
InternetInside     object
MinSpeed          float64
Seats             float64
TopSpeed          float64
VehicleType        object
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

From your data, as you have a value either in 'NumericalValues or 'CategoricalValues', so you can create a column that combine all the information from both column with fillna, then use the pivot as you did.

res = (
    data.assign(Values=lambda x: x['NumericalValues'].fillna(x['CategoricalValues']))
      .pivot(index='CarModel', columns='Features', values='Values')
      .reset_index().rename_axis(columns=None)
)
print(res)
#   CarModel AirBags ChasisNum  Color InternetInside MinSpeed Seats TopSpeed  \
# 0     Benz     5.0  654121.0  Black            Yes      0.0   5.0    200.0   
# 1  Ferrari     2.0  781121.0    Red             No      0.0   4.0    250.0   

#   VehicleType  
# 0       Sedan  
# 1         SUV  

3 Comments

This is a simple workaround (that was my initial idea), but the bad drawback is that it combines the dtypes, and everything will become object
@mozway well chaining with .convert_dtypes() should fix it :)
Thanks @Ben.T for the solution and we have the work around for dtypes change. So, it's absolutely fine and less time-consuming solution.
1

Another possible solution, using pandas.pivot_table:

out = (data.pivot_table(
    index=['CarModel',  'Country', 'DeliveryDate'], 
    columns='Features', values=['NumericalValues', 'CategoricalValues'],
    aggfunc=max)
       .droplevel(0, axis=1)
       .rename_axis(None, axis=1)
       .reset_index())

Output:

  CarModel Country DeliveryDate  Color InternetInside VehicleType  AirBags  \
0     Benz     USA   2022-08-12  Black            Yes       Sedan      5.0   
1  Ferrari  France   2022-12-25    Red             No         SUV      2.0   

   ChasisNum  MinSpeed  Seats  TopSpeed  
0   654121.0       0.0    5.0     200.0  
1   781121.0       0.0    4.0     250.0  

dtypes:

CarModel           object
Country            object
DeliveryDate       object
Color              object
InternetInside     object
VehicleType        object
AirBags           float64
ChasisNum         float64
MinSpeed          float64
Seats             float64
TopSpeed          float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.