Add values to dataframe based on existing unique values

Question

I have a dataframe DF with 2 columns:

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'

I need all the classes to have 1 vacant student. Some of them already have.

RESULT

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'math'  Vacant
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'
'Hist'   Vacant

I have tried

unique_class = DF['unique_class'].drop_duplicates()
vacant_column = pd.Series(['vacant'] * unique_class.shape[0])
temp_df = pd.concat([unique_class, vacant_column], axis=1, ignore_index=True)
DF = DF.append(temp_df, ignore_index=True)
DF.drop_duplicates(inplace=True)

It works but it seems too much. Any better way to do this?

Plasma · Accepted Answer · 2019-05-15 07:05:56Z

2

Here's another way to do it:

# Copy of your data
df = pd.DataFrame({
    "class": ["Sci", "Sci", "math", "math", "math", "eng", "eng", "eng", "Hist", "Hist"],
    "student": ["Francy", "vacant", "Alex", "Arthur", "Katy", "Jack", "vacant", "Francy", "Francy", "Francy"]
    })

# An identical DF with all students equal to "vacant"
vacant_df = pd.DataFrame({"class": df["class"], "student": "vacant"})

# Remove existing 'vacant' from original DF and concatenate with de-duplicated vacant dataframe (to avoid duplicate 'vacant' entries)
final_df = pd.concat([df.loc[df.student != "vacant", vacant_df.drop_duplicates("class")])

Original DF:

  class student
8  Hist  Francy
9  Hist  Francy
0   Sci  Francy
1   Sci  vacant
5   eng    Jack
6   eng  vacant
7   eng  Francy
2  math    Alex
3  math  Arthur
4  math    Katy

Final DF:

  class student
8  Hist  Francy
9  Hist  Francy
8  Hist  vacant
0   Sci  Francy
0   Sci  vacant
5   eng    Jack
7   eng  Francy
5   eng  vacant
2  math    Alex
3  math  Arthur
4  math    Katy
2  math  vacant

answered May 15, 2019 at 7:05

Plasma

1,9611 gold badge24 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pab Over a year ago

I'll upvote. However, my dataset is huge, can't afford to make a copy of the whole df. Thanks

Sociopath · Accepted Answer · 2019-05-15 07:17:02Z

2

Using pd.merge

df_new = pd.DataFrame({'CLASS': df['CLASS'].unique(), 'STUDENT':'vacant'})

df_new.merge(df, how='outer', on=['CLASS','STUDENT'])

# Use `.sort_values(by='CLASS') if sorted df needed

Output:

    CLASS   STUDENT
0   Sci vacant
1   math    vacant
2   eng     vacant
3   Hist    vacant
4   Sci     Francy
5   math    Alex
6   math    Arthur
7   math    Katy
8   eng     Jack
9   eng     Francy
10  Hist    Francy
11  Hist    Francy

edited May 15, 2019 at 7:17

answered May 15, 2019 at 7:11

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

Comments

Chris Adams · Accepted Answer · 2019-05-15 07:21:09Z

2

For the record, there is nothing wrong with your solution. You can achieve the same result in a "one-liner" using practically the same approach:

df = df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant')).drop_duplicates()

[out]

  CLASS STUDENT
0   Sci  Francy
1   Sci  Vacant
2  math    Alex
3  math  Arthur
4  math    Katy
5   eng    Jack
6   eng  Vacant
7   eng  Francy
8  Hist  Francy
2  math  Vacant
8  Hist  Vacant

And if required, you can chain on sort_values and reset_index to make the table a bit more legible:

df = (df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant'))
      .drop_duplicates()
      .sort_values('CLASS')
      .reset_index(drop=True))

[out]

   CLASS STUDENT
0   Hist  Francy
1   Hist  Vacant
2    Sci  Francy
3    Sci  Vacant
4    eng    Jack
5    eng  Vacant
6    eng  Francy
7   math    Alex
8   math  Arthur
9   math    Katy
10  math  Vacant

edited May 15, 2019 at 7:21

answered May 15, 2019 at 7:08

Chris Adams

18.7k4 gold badges26 silver badges44 bronze badges

5 Comments

Pab Over a year ago

In your solution, math is missing vacant

Chris Adams Over a year ago

don't think so..? 2nd bottom line?

Pab Over a year ago

Oh, I see, sorry

Pab Over a year ago

Your drop_duplicates should have inplace=True?

Chris Adams Over a year ago

probably better to use assignment in this case... df = ... will update my answer

Collectives™ on Stack Overflow

Add values to dataframe based on existing unique values

3 Answers 3

1 Comment

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related