I have a dataframe that looks like this:
df = pd.DataFrame({
'name': ['John','William', 'Nancy', 'Susan', 'Robert', 'Lucy', 'Blake', 'Sally', 'Bruce'],
'injury': ['right hand broken', 'lacerated left foot', 'foot broken', 'right foot fractured', '', 'sprained finger', 'chest pain', 'swelling in arm', 'laceration to arms, hands, and foot']
})
name injury
0 John right hand broken
1 William lacerated left foot
2 Nancy foot broken
3 Susan right foot fractured
4 Robert
5 Lucy sprained finger
6 Blake chest pain
7 Sally swelling in arm
8 Bruce lacerations to arm, hands, and foot <-- this is a weird case, since there are multiple body parts
Notably, some of the values in the injury column are blank.
I want to replace the values in the injury column with only the affected body part. In my case, that would be hand, foot, finger, and chest, arm. There are dozens more... this is a small example.
The desired dataframe would look like this:
name injury
0 John hand
1 William foot
2 Nancy foot
3 Susan foot
4 Robert
5 Lucy finger
6 Blake chest
7 Sally arm
8 Bruce arm, hand, foot
I could do something like this:
df.loc[df['injury'].str.contains('hand'), 'injury'] = 'hand'
df.loc[df['injury'].str.contains('foot'), 'injury'] = 'foot'
df.loc[df['injury'].str.contains('finger'), 'injury'] = 'finger'
df.loc[df['injury'].str.contains('chest'), 'injury'] = 'chest'
df.loc[df['injury'].str.contains('arm'), 'injury'] = 'arm'
But, this might not be the most elegant way.
Is there a more elegant way to do this? (e.g. using a dictionary)
(any advice on that last case with multiple body parts would be appreciated)
Thank you!