Apply function to create string with multiple columns as argument

Question

I have a dataframe like this:

     name .  size . type    .  av_size_type
0    John .   23  . Qapra'  .            22
1     Dan .   21  . nuk'neH .            12
2  Monica .   12  . kahless .            15

I want to create a new column with a sentence, like this:

    name .  size . type    .  av_size_type  .   sentence
0    John .   23 . Qapra'  .            22  .   "John has size 23, above the average of Qapra' type (22)"
1     Dan .   21 . nuk'neH .            12  .   "Dan has size 21, above the average of nuk'neH type (21)"
2  Monica .   12 . kahless .            15  .   "Monica has size 12l, above the average of kahless type (12)

It would be something like this:

def func(x):
    string="{0} has size {1}, above the average of {2} type ({3})".format(x[0],x[1],x[2],x[3])
    return string

df['sentence']=df[['name','size','type','av_size_type']].apply(func)

However, apparently this sort of synthax doesn't work.

Would anyone have a suggestion for that?

Dunno, in fn try return string (and maybe get some sleep :) — Jus
– Jus, Commented Mar 6, 2018 at 3:33
@umutto oops, that's right. already fixed. it doesn't work either way — aabujamra
– aabujamra, Commented Mar 6, 2018 at 3:34
Yeah just realized you need to apply over columns as well, so .apply(func, axis=1) should work. — umutto
– umutto, Commented Mar 6, 2018 at 3:35

piRSquared · Accepted Answer · 2018-03-06 03:46:43Z

8

Use a splat and unpack

string = lambda x: "{} has size {}, above the average of {} type ({})".format(*x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...

If you want, you can use dictionary unpacking

string = lambda x: "{name} has size {size}, above the average of {type} type ({av_size_type})".format(**x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...

edited Mar 6, 2018 at 3:46

answered Mar 6, 2018 at 3:38

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

aabujamra Over a year ago

Thanks @piRSquared! Is there a way I could select which columns I want to put inside the format? My df actually has dozens of columns, I tried to simplify it here.

piRSquared Over a year ago

You can use the row series as a dictionary and unpack with a double splat.

cs95 · Accepted Answer · 2018-03-06 03:38:38Z

4

Use a list comprehension as a fast alternative since you're forced to iterate:

string = "{0} has size {1}, above the average of {2} type ({3})"
df['sentence'] = [string.format(*r) for r in df.values.tolist()]

df

     name  size     type  av_size_type  \
0    John    23   Qapra'            22   
1     Dan    21  nuk'neH            12   
2  Monica    12  kahless            15   

                                            sentence  
0  John has size 23, above the average of Qapra' ...  
1  Dan has size 21, above the average of nuk'neH ...  
2  Monica has size 12, above the average of kahle...

answered Mar 6, 2018 at 3:38

cs95

406k106 gold badges744 silver badges797 bronze badges

1 Comment

cs95 Over a year ago

@OP, with this method, it is easiest to select the columns you want to output. Just index them as so: df[col1, col2, ...].values.tolist()

Allen Qin · Accepted Answer · 2018-03-06 03:45:01Z

4

You can use apply to build the sentence directly.

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(*x), axis=1)
)

If you would like to reference the columns explicitly, you can do:

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(x.name, x.size, x.type, x.av_size_type), axis=1)
)

edited Mar 6, 2018 at 3:45

answered Mar 6, 2018 at 3:39

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

1 Comment

aabujamra Over a year ago

Thanks @Allen. Seems like a good solution. Is there a way I could select which columns I want to put inside the format? My df actually has dozens of columns, I tried to simplify it here.

Collectives™ on Stack Overflow

Apply function to create string with multiple columns as argument

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related