0

I asked this question and got a great help. I have a dataframe with multiple columns and 4 years of data and interested in ranks 1 or 2 only.

Name Rank  Year
 Joe  1     2019
 Ben  2     2018
 Jo   3     2020
 Bo   1     2018
 Boo  1     2021

If a name had 1 or 2 rank in a specific year, I want to create a relevant boolean column

Expected output

 Name Rank  Year If_1st_2018 If_1st_2019 If_1st_first_2020 If_1st_2021 If_2nd_2018 If_2nd_2019 etc
 Joe  1     2019     0           1           0                  0            0           0
 Ben  2     2018     0           0           0                  0            1           0
 Jo   3     2020     0           0           0                  0            0           0
 Bo   1     2018     1           0           0                  0            0           0
 Boo  1     2021      0           0           0                 1           0           0
2
  • 2
    Where does If_1st_2021 come from? Is there a 2021 value in the Rank column that you just didn't show? Commented Dec 20, 2021 at 18:50
  • 1
    @richardec, correct, I added a row Commented Dec 20, 2021 at 18:53

2 Answers 2

2

This time, I think a cool solution would be to combine the Rank and Year columns and then use pd.get_dummies:

df = pd.concat([df, pd.get_dummies('If_' + df['Rank'].map({1: '1st', 2: '2nd'}) + '_' + df['Year'].astype(str))], axis=1)

Output:

>>> df
  Name  Rank  Year  If_1st_2018  If_1st_2019  If_1st_2021  If_2nd_2018
0  Joe     1  2019            0            1            0            0
1  Ben     2  2018            0            0            0            1
2   Jo     3  2020            0            0            0            0
3   Bo     1  2018            1            0            0            0
4  Boo     1  2021            0            0            1            0
Sign up to request clarification or add additional context in comments.

1 Comment

@Anakin if you want to add more ranks, e.g. 3, just add them to the .map() call ;)
2

You can use:

df_new = pd.crosstab(df['Name'], [df['Rank'], df['Year']], dropna=False)
df_new = df_new[[1,2]]
df_new.columns = ['_'.join(map(str, x)) for x in df_new.columns]
df_new.reset_index(inplace=True)
df = df.merge(df_new, how='left', on=['Name'])
print(df)

OUTPUT

   Name  Rank  Year  1_2018  1_2019  1_2020  2_2018  2_2019  2_2020
0  Joe     1  2019       0       1       0       0       0       0
1  Ben     2  2018       0       0       0       1       0       0
2   Jo     3  2020       0       0       0       0       0       0
3   Bo     1  2018       1       0       0       0       0       0

2 Comments

thanks, but I do not need anything beyond rank 1 or 2. 3 is not necessary. I have ranks up to 100, I am interested in 8 boolean columns only (2 for each of 4 years)
You can add this df_new = df_new[[1,2]] after crsstab

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.