1

I have a dataframe that looks like this:

df = pd.DataFrame({
    'name': ['John','William', 'Nancy', 'Susan', 'Robert', 'Lucy', 'Blake', 'Sally', 'Bruce'],
    'injury': ['right hand broken', 'lacerated left foot', 'foot broken', 'right foot fractured', '', 'sprained finger', 'chest pain', 'swelling in arm', 'laceration to arms, hands, and foot']
    })


    name      injury
0   John      right hand broken
1   William   lacerated left foot
2   Nancy     foot broken
3   Susan     right foot fractured
4   Robert  
5   Lucy      sprained finger
6   Blake     chest pain
7   Sally     swelling in arm
8   Bruce     lacerations to arm, hands, and foot      <-- this is a weird case, since there are multiple body parts

Notably, some of the values in the injury column are blank.

I want to replace the values in the injury column with only the affected body part. In my case, that would be hand, foot, finger, and chest, arm. There are dozens more... this is a small example.

The desired dataframe would look like this:

    name      injury
0   John      hand
1   William   foot
2   Nancy     foot
3   Susan     foot
4   Robert  
5   Lucy      finger
6   Blake     chest
7   Sally     arm
8   Bruce     arm, hand, foot

I could do something like this:

df.loc[df['injury'].str.contains('hand'), 'injury'] = 'hand'
df.loc[df['injury'].str.contains('foot'), 'injury'] = 'foot'
df.loc[df['injury'].str.contains('finger'), 'injury'] = 'finger'
df.loc[df['injury'].str.contains('chest'), 'injury'] = 'chest'
df.loc[df['injury'].str.contains('arm'), 'injury'] = 'arm'

But, this might not be the most elegant way.

Is there a more elegant way to do this? (e.g. using a dictionary)

(any advice on that last case with multiple body parts would be appreciated)

Thank you!

3 Answers 3

1

I think you should maintain a list of text, and using apply function:

body_parts = ['hand', 'foot', 'finger', 'chest', 'arm']
def test(value):
    body_text = []
    for body_part in body_parts:
        if body_part in value:
             body_text.append(body_part)
    if body_text:
        return ', '.join(body_text)
    return value
df['injury'] = df['injury'].apply(test)

return:

name    injury
0   John    hand
1   William foot
2   Nancy   foot
3   Susan   foot
4   Robert  
5   Lucy    finger
6   Blake   chest
7   Sally   arm
8   Bruce   hand, foot, arm
Sign up to request clarification or add additional context in comments.

Comments

0

The standard way to get the first match of a regex on a string column is to use .extract(), please see the quickstart 10 minutes to pandas: working with text data.

df['injury'].str.extract('(arm|chest|finger|foot|hand)', expand=False)

0      hand
1      foot
2      foot
3      foot
4       NaN
5    finger
6     chest
7       arm
8       arm
Name: injury, dtype: object

Note row 4 returned NaN rather than '' (but it's trivial to apply .fillna('') to the result). More importantly in row 8 we'll only return the first match, not all matches. You need to decide how you want to handle this. See .extractall()

Comments

0
selected_words = ["hand", "foot", "finger", "chest", "arms", "arm", "hands"]

df["injury"] = (
    df["injury"]
    .str.replace(",", "")
    .str.split(" ", expand=False)
    .apply(lambda x: ", ".join(set([i for i in x if i in selected_words])))
)
print(df)

      name             injury
0     John               hand
1  William               foot
2    Nancy               foot
3    Susan               foot
4   Robert                   
5     Lucy             finger
6    Blake              chest
7    Sally                arm
8    Bruce  arms, foot, hands

1 Comment

Extending the solution by @Jason Baker to: .apply(lambda x: ', '.join(set([i for i in x if i in selected_body_parts and i not np.nan else np.nan]))), I'm getting a syntax error. I'd like to check for NaN values (because there are some), and output NaN if the list comprehension encounters a case that's not in the list of selected_words.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.