3

I am trying to generate a third column in pandas dataframe using two other columns in dataframe. The requirement is very particular to the scenario for which I need to generate the third column data.

The requirement is stated as:

let the dataframe name be df, first column be 'first_name'. second column be 'last_name'. I need to generate third column in such a manner so that it uses string formatting to generate a particular string and pass it to a function and whatever the function returns should be used as value to third column.

Problem 1

base_string = "my name is {first} {last}"

df['summary'] = base_string.format(first=df['first_name'], last=df['last_name'])

Problem 2

df['summary'] = some_func(base_string.format(first=df['first_name'], last=df['last_name']))

My ultimate goal is to solve problem 2 but for that problem 1 is pre-requisite and as of now I'm unable to solve that. I have tried converting my dataframe values to string but it is not working the way I expected.

2 Answers 2

3

You can do apply:

df.apply(lambda r: base_string.format(first=r['first_name'], last=r['last_name']) ),
         axis=1)

Or list comprehension:

df['summary'] = [base_string.format(first=x,last=y) 
                 for x,y in zip(df['first_name'], df['last_name'])

And then, for general function some_func:

df['summary'] = [some_func(base_string.format(first=x,last=y) )
                 for x,y in zip(df['first_name'], df['last_name'])
Sign up to request clarification or add additional context in comments.

Comments

0

You could use pandas.DataFrame.apply with axis=1 so your code will look like this:

def mapping_function(row):
    #make your calculation
    return value
df['summary'] = df.apply(mapping_function, axis=1)

5 Comments

It works, but I think passing the entire row to a custom function would be highly inefficient. Would be best to send only the columns necessary.
Look at this question maybe np.vectorize could work for you stackoverflow.com/questions/52673285/…
Never looked into np.vectorize and my code is plagued with slow.apply(custom_funcs). Really great tip, will definitely checkout :)
Yeah, I found it out just now. It's useful for some of my python notebook too. Cheers.
Note that np.vectorize is not vecterization by any mean. It is just a wrapped for loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.