0

First time posting; apologies for formatting errors. I have a data set that contains age ranges in separate columns, and I'm trying to create a new column based on a string evaluation of the AGE_OPERATOR_TXT column:

I've tried using .apply() functions, lambda, for loops with iterrows(), etc... but either I can't get anything to return, or the function returns a series with ALL of the rows:

def multum_age_ops(s):
    if s == "<":
        return data['AGE_LOW_NBR'] + " + " + data['AGE_UNIT_DISP']
else:
    return 0

data['age_op_test'] = data['AGE_OPERATOR_TXT'].apply(multum_age_ops)

I would expect that the column returned would actually look something like:

age_ops_test
0 0
1 18 + years
2 1 + months
3 4 + months
4 4 + months

What I'm getting is:

age_ops_test
0                                                        0
1        0        18\n1        18\n2         1\n3      ...
2        0        18\n1        18\n2         1\n3      ...
3        0        18\n1        18\n2         1\n3      ...
4        0        18\n1        18\n2         1\n3      ...
5        0        18\n1        18\n2         1\n3      ...
6        0        18\n1        18\n2         1\n3      ...

Any help is appreciated.

4
  • 3
    Because you are returning a series: return data['AGE_LOW_NBR'] + " + " + data['AGE_UNIT_DISP'] that's a series... Commented Aug 7, 2019 at 20:30
  • 2
    You should be doing this, if you really want to do it with apply, by applying across the whole dataframe with axis=1. Perhaps consider just concatenating across by element using the series syntax? Commented Aug 7, 2019 at 20:37
  • Thanks for the quick replies! Again, first question, so I should have mentioned that I did try data['age_op_test'] = data['AGE_OPERATOR_TXT'].apply(multum_age_ops, axis=1) but it returns an "unexpected argument" error. Commented Aug 8, 2019 at 17:37
  • My final workaround (not very Pythonic) was to create the first column based on sinanggul's suggestion, then create a second that evaluates and instead of returning 0 in the else clause, returns the first column. Then a third that evaluates as above and returns the second column in the else clause. Commented Aug 8, 2019 at 17:56

3 Answers 3

1

As mentioned in ifly's comment, the key is to use apply on the entire dataframe over axis=1 so that the function/lambda gets applied to each row. In your case, that would look like this:

data['age_op_test'] = data.apply(lambda row: row['AGE_LOW_NBR'] + " + " + row['AGE_UNIT_DISP'] if row['AGE_OPERATOR_TXT'] == "<" else "0", axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

This worked perfectly - now I need to extend it to include two more "elif" clauses. Thanks for the help!!
1

You can also use np.where (doc):

data['age_op_test'] = np.where(data['AGE_OPERATOR_TXT'] == "<", data['AGE_LOW_NBR'] + " + " + data['AGE_UNIT_DISP'],0)

What np.where does in this case is returns "0" if data['AGE_OPERATOR_TXT'] == "<" is False. If True, it returns data['AGE_LOW_NBR'] + " + " + data['AGE_UNIT_DISP'].

2 Comments

There's no pd prefix before np
You are correct @ifly6, thank you. Answer was edited to reflect change.
0

Can you try to do something like this :


df.loc[df['AGE_OPERATOR_TXT']=='<', "age_op_test"] = df["AGE_LOW_NBR"].astype(str).str.cat(df["AGE_UNIT_DISP"].astype(str), sep=" + ")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.