Creating multiple boolean columns in pandas based on two conditions

Question

I asked this question and got a great help. I have a dataframe with multiple columns and 4 years of data and interested in ranks 1 or 2 only.

Name Rank  Year
 Joe  1     2019
 Ben  2     2018
 Jo   3     2020
 Bo   1     2018
 Boo  1     2021

If a name had 1 or 2 rank in a specific year, I want to create a relevant boolean column

Expected output

 Name Rank  Year If_1st_2018 If_1st_2019 If_1st_first_2020 If_1st_2021 If_2nd_2018 If_2nd_2019 etc
 Joe  1     2019     0           1           0                  0            0           0
 Ben  2     2018     0           0           0                  0            1           0
 Jo   3     2020     0           0           0                  0            0           0
 Bo   1     2018     1           0           0                  0            0           0
 Boo  1     2021      0           0           0                 1           0           0

Where does If_1st_2021 come from? Is there a 2021 value in the Rank column that you just didn't show? — user17242583
– user17242583, Commented Dec 20, 2021 at 18:50

user17242583 · Accepted Answer · 2021-12-20 18:59:15Z

2

This time, I think a cool solution would be to combine the Rank and Year columns and then use pd.get_dummies:

df = pd.concat([df, pd.get_dummies('If_' + df['Rank'].map({1: '1st', 2: '2nd'}) + '_' + df['Year'].astype(str))], axis=1)

Output:

>>> df
  Name  Rank  Year  If_1st_2018  If_1st_2019  If_1st_2021  If_2nd_2018
0  Joe     1  2019            0            1            0            0
1  Ben     2  2018            0            0            0            1
2   Jo     3  2020            0            0            0            0
3   Bo     1  2018            1            0            0            0
4  Boo     1  2021            0            0            1            0

answered Dec 20, 2021 at 18:59

user17242583

Sign up to request clarification or add additional context in comments.

1 Comment

user17242583 Over a year ago

@Anakin if you want to add more ranks, e.g. 3, just add them to the .map() call ;)

Muhammad Hassan · Accepted Answer · 2021-12-20 19:10:44Z

2

You can use:

df_new = pd.crosstab(df['Name'], [df['Rank'], df['Year']], dropna=False)
df_new = df_new[[1,2]]
df_new.columns = ['_'.join(map(str, x)) for x in df_new.columns]
df_new.reset_index(inplace=True)
df = df.merge(df_new, how='left', on=['Name'])
print(df)

OUTPUT

   Name  Rank  Year  1_2018  1_2019  1_2020  2_2018  2_2019  2_2020
0  Joe     1  2019       0       1       0       0       0       0
1  Ben     2  2018       0       0       0       1       0       0
2   Jo     3  2020       0       0       0       0       0       0
3   Bo     1  2018       1       0       0       0       0       0

edited Dec 20, 2021 at 19:10

answered Dec 20, 2021 at 18:57

Muhammad Hassan

4,2492 gold badges16 silver badges30 bronze badges

2 Comments

Anakin Skywalker Over a year ago

thanks, but I do not need anything beyond rank 1 or 2. 3 is not necessary. I have ranks up to 100, I am interested in 8 boolean columns only (2 for each of 4 years)

Muhammad Hassan Over a year ago

You can add this df_new = df_new[[1,2]] after crsstab

Collectives™ on Stack Overflow

Creating multiple boolean columns in pandas based on two conditions

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related