7

I have a dataframe like this:

     name .  size . type    .  av_size_type
0    John .   23  . Qapra'  .            22
1     Dan .   21  . nuk'neH .            12
2  Monica .   12  . kahless .            15

I want to create a new column with a sentence, like this:

    name .  size . type    .  av_size_type  .   sentence
0    John .   23 . Qapra'  .            22  .   "John has size 23, above the average of Qapra' type (22)"
1     Dan .   21 . nuk'neH .            12  .   "Dan has size 21, above the average of nuk'neH type (21)"
2  Monica .   12 . kahless .            15  .   "Monica has size 12l, above the average of kahless type (12)

It would be something like this:

def func(x):
    string="{0} has size {1}, above the average of {2} type ({3})".format(x[0],x[1],x[2],x[3])
    return string

df['sentence']=df[['name','size','type','av_size_type']].apply(func)

However, apparently this sort of synthax doesn't work.

Would anyone have a suggestion for that?

4
  • 1
    You forgot to return the string in your function... Commented Mar 6, 2018 at 3:33
  • 1
    Dunno, in fn try return string (and maybe get some sleep :) Commented Mar 6, 2018 at 3:33
  • @umutto oops, that's right. already fixed. it doesn't work either way Commented Mar 6, 2018 at 3:34
  • Yeah just realized you need to apply over columns as well, so .apply(func, axis=1) should work. Commented Mar 6, 2018 at 3:35

3 Answers 3

8

Use a splat and unpack

string = lambda x: "{} has size {}, above the average of {} type ({})".format(*x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...

If you want, you can use dictionary unpacking

string = lambda x: "{name} has size {size}, above the average of {type} type ({av_size_type})".format(**x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @piRSquared! Is there a way I could select which columns I want to put inside the format? My df actually has dozens of columns, I tried to simplify it here.
You can use the row series as a dictionary and unpack with a double splat.
4

Use a list comprehension as a fast alternative since you're forced to iterate:

string = "{0} has size {1}, above the average of {2} type ({3})"
df['sentence'] = [string.format(*r) for r in df.values.tolist()]

df

     name  size     type  av_size_type  \
0    John    23   Qapra'            22   
1     Dan    21  nuk'neH            12   
2  Monica    12  kahless            15   

                                            sentence  
0  John has size 23, above the average of Qapra' ...  
1  Dan has size 21, above the average of nuk'neH ...  
2  Monica has size 12, above the average of kahle... 

1 Comment

@OP, with this method, it is easiest to select the columns you want to output. Just index them as so: df[col1, col2, ...].values.tolist()
4

You can use apply to build the sentence directly.

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(*x), axis=1)
)

If you would like to reference the columns explicitly, you can do:

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(x.name, x.size, x.type, x.av_size_type), axis=1)
)

1 Comment

Thanks @Allen. Seems like a good solution. Is there a way I could select which columns I want to put inside the format? My df actually has dozens of columns, I tried to simplify it here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.