2

Background

I have the following toy df that contains lists in the columns Before and After as seen below

import pandas as pd
before = [list(['in', 'the', 'bright', 'blue', 'box']), 
       list(['because','they','go','really','fast']), 
       list(['to','ride','and','have','fun'])]
after = [list(['there', 'are', 'many', 'different']), 
       list(['i','like','a','lot','of', 'sports']), 
       list(['the','middle','east','has','many'])]

df= pd.DataFrame({'Before' : before, 
                   'After' : after,
                  'P_ID': [1,2,3], 
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']
                 })

Output

                    After                Before                     N_ID P_ID   Word
0   [in, the, bright, blue, box]        [there, are, many, different]   A1  1   crayons
1   [because, they, go, really, fast]   [i, like, a, lot, of, sports ]  A2  2   cars
2   [to, ride, and, have, fun]        [the, middle, east, has, many]    A3  3   camels

Problem

Using the following block of code:

df.loc[:, ['After', 'Before']] = df[['After', 'Before']].apply(lambda x: x.str[0].str.replace(',', '')) taken from Removing commas and unlisting a dataframe produce the following output:

Close-to-what-I-want-but-not-quite- Output

    After   Before  N_ID  P_ID  Word
0   in      there    A1    1    crayons
1   because  i       A2    2    cars
2   to      the      A3    3    camels

This output is close but not quite what I am looking for because After and Before columns have only one word outputs (e.g. there) when my desired output looks as such:

Desired Output

     After                           Before               N_ID  P_ID  Word
0 in the bright blue box        there are many different  A1    1   crayons
1 because they go really fast   i like a lot of sports    A2    2   cars
2 to ride and have fun         the middle east has many   A3    3   camels

Question

How do I get my Desired Output?

3 Answers 3

4

agg + join. The commas aren't present in your lists, they are just part of the __repr__ of the list.


str_cols = ['Before', 'After']

d = {k: ' '.join for k in str_cols}

df.agg(d).join(df.drop(str_cols, 1))

                        Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

If you'd prefer in place (faster):

df[str_cols] = df.agg(d)
Sign up to request clarification or add additional context in comments.

1 Comment

Surely this deserves an up vote as well as accpetance
3

applymap

In line

New copy of a dataframe with desired results

df.assign(**df[['After', 'Before']].applymap(' '.join))

                        Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

In place

Mutate existing df

df.update(df[['After', 'Before']].applymap(' '.join))
df

                        Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

stack and str.join

We can use this result in a similar "In line" and "In place" way as shown above.

df[['After', 'Before']].stack().str.join(' ').unstack()

                      After                       Before
0  there are many different       in the bright blue box
1    i like a lot of sports  because they go really fast
2  the middle east has many         to ride and have fun

Comments

2

We can specify the lists we want to convert to string and then use .apply in a for loop:

lst_cols = ['Before',  'After']

for col in lst_cols:
    df[col] = df[col].apply(' '.join)
                        Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

1 Comment

Who ever downvoted my answer, could you explain why?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.