1

I have the following pandas dataframe :

import pandas as pd 
import numpy as np

data = [['Apple', 1, 1, 1 ,1,], ['Orange', np.nan, 1, 1, np.nan], ['Banana', 1, np.nan, 1, np.nan]]

df = pd.DataFrame(data, columns = ['Type of fruit', 'Paris', "Boston", "Austin", "New York"])

output:

Type of fruit   Paris   Boston  Austin  New York
0   Apple       1.0     1.0     1       1.0
1   Orange      NaN     1.0     1       NaN
2   Banana      1.0     NaN     1       NaN

I would like to create a new column named "Location", with new indexes based on the four columns Paris, Boston, Austin, New York such as:

Ideal ouptut :

    Location    Type of fruit
0   Paris       Apple
1   Boston      Apple
2   Austin      Apple
3   New York    Apple
4   Boston      Orange
5   Austin      Orange
6   Paris       Banana
7   Austin      Banana

I could filter each location columns to keep non-null indexes (exemple for Paris) :

df_paris = df.loc[df["Paris"].notna(),["Type of fruit"]]
df_paris["Location"] = "Paris"

and then concatenate the dataframes for each location:

pd.concat([df_paris, df_boston, df_austin, df_new_york])

but I'm sure there is a better way to do this stuff using pandas functions.

2 Answers 2

1

Use DataFrame.set_index with DataFrame.stack, there are missing values removed by default:

df1 = (df.set_index('Type of fruit')
         .rename_axis('Location', axis=1)
         .stack()
         .reset_index()[['Location','Type of fruit']])

Or convert MultiIndex to new DataFrame:

df1 = (df.set_index('Type of fruit')
         .rename_axis('Location', axis=1)
         .stack()
         .swaplevel(1,0)
         .index
         .to_frame(index=False))

Or use DataFrame.melt for unpivot with remove missing rows by DataFrame.dropna:

df1 = (df.melt('Type of fruit', var_name='Location', ignore_index=False)
         .sort_index()
         .dropna(subset=['value'])[['Location','Type of fruit']]
         .reset_index(drop=True))

print (df1)
   Location Type of fruit
0     Paris         Apple
1    Boston         Apple
2    Austin         Apple
3  New York         Apple
4    Boston        Orange
5    Austin        Orange
6     Paris        Banana
7    Austin        Banana
Sign up to request clarification or add additional context in comments.

Comments

1
(
    df
    .set_index("Type of fruit")      # Moves the 'Type of fruit' column to the index
    .rename_axis('Location', axis=1) # Sets the name of the column index to 'Location'
    .stack()                         # Moves the remaining columns to be a second index level
    .index                           # Select just the index (which is a MultiIndex with `Type of fruit` and `Location` as levels)
    .swaplevel()                     # Swaps the level order so that `Location` is first
    .to_frame(index=False)           # Convert the MultiIndex to a DataFrame. 
                                     # `index=False` means that the resulting DF will just have a numeric index. 
                                     # If index=True, the input MultiIndex would be used as both the index and content of the DataFrame

)

1 Comment

Good to understand

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.