How to create new index lines based on column conditions in this pandas dataframe?

Question

I have the following pandas dataframe :

import pandas as pd 
import numpy as np

data = [['Apple', 1, 1, 1 ,1,], ['Orange', np.nan, 1, 1, np.nan], ['Banana', 1, np.nan, 1, np.nan]]

df = pd.DataFrame(data, columns = ['Type of fruit', 'Paris', "Boston", "Austin", "New York"])

output:

Type of fruit   Paris   Boston  Austin  New York
0   Apple       1.0     1.0     1       1.0
1   Orange      NaN     1.0     1       NaN
2   Banana      1.0     NaN     1       NaN

I would like to create a new column named "Location", with new indexes based on the four columns Paris, Boston, Austin, New York such as:

Ideal ouptut :

    Location    Type of fruit
0   Paris       Apple
1   Boston      Apple
2   Austin      Apple
3   New York    Apple
4   Boston      Orange
5   Austin      Orange
6   Paris       Banana
7   Austin      Banana

I could filter each location columns to keep non-null indexes (exemple for Paris) :

df_paris = df.loc[df["Paris"].notna(),["Type of fruit"]]
df_paris["Location"] = "Paris"

and then concatenate the dataframes for each location:

pd.concat([df_paris, df_boston, df_austin, df_new_york])

but I'm sure there is a better way to do this stuff using pandas functions.

jezrael · Accepted Answer · 2021-07-29 09:35:18Z

Use DataFrame.set_index with DataFrame.stack, there are missing values removed by default:

df1 = (df.set_index('Type of fruit')
         .rename_axis('Location', axis=1)
         .stack()
         .reset_index()[['Location','Type of fruit']])

Or convert MultiIndex to new DataFrame:

df1 = (df.set_index('Type of fruit')
         .rename_axis('Location', axis=1)
         .stack()
         .swaplevel(1,0)
         .index
         .to_frame(index=False))

Or use DataFrame.melt for unpivot with remove missing rows by DataFrame.dropna:

df1 = (df.melt('Type of fruit', var_name='Location', ignore_index=False)
         .sort_index()
         .dropna(subset=['value'])[['Location','Type of fruit']]
         .reset_index(drop=True))

print (df1)
   Location Type of fruit
0     Paris         Apple
1    Boston         Apple
2    Austin         Apple
3  New York         Apple
4    Boston        Orange
5    Austin        Orange
6     Paris        Banana
7    Austin        Banana

Dom · Accepted Answer · 2021-07-29 09:41:56Z

1

(
    df
    .set_index("Type of fruit")      # Moves the 'Type of fruit' column to the index
    .rename_axis('Location', axis=1) # Sets the name of the column index to 'Location'
    .stack()                         # Moves the remaining columns to be a second index level
    .index                           # Select just the index (which is a MultiIndex with `Type of fruit` and `Location` as levels)
    .swaplevel()                     # Swaps the level order so that `Location` is first
    .to_frame(index=False)           # Convert the MultiIndex to a DataFrame. 
                                     # `index=False` means that the resulting DF will just have a numeric index. 
                                     # If index=True, the input MultiIndex would be used as both the index and content of the DataFrame

)

answered Jul 29, 2021 at 9:41

Dom

3001 silver badge4 bronze badges

1 Comment

dgor Over a year ago

Good to understand

Collectives™ on Stack Overflow

How to create new index lines based on column conditions in this pandas dataframe?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related