Adding column to pandas dataframe taking values from list in other column

Question

I'm new to Python so I'm sorry if terminology is not correct; I've searched for similar posts but didn't find anything helpful for my case. I have a dataframe like this:

    Column1     Column2
0   0001        [('A','B'),('C','D'),('E','F')]
1   0001        [('A','B'),('C','D'),('E','F')]
2   0001        [('A','B'),('C','D'),('E','F')]
3   0002        [('G','H'),('I','J')]
4   0002        [('G','H'),('I','J')]

Each row is replicated n times based on the number of tuples contained in the list of Column2. What I'd like to do is to add a new column containing only one tuple per row:

Column1     Column2                             Column2_new
0   0001        [('A','B'),('C','D'),('E','F')]     'A' 'B'
1   0001        [('A','B'),('C','D'),('E','F')]     'C' 'D'
2   0001        [('A','B'),('C','D'),('E','F')]     'E' 'F'
3   0002        [('G','H'),('I','J')]               'G' 'H'
4   0002        [('G','H'),('I','J')]               'I' 'J'

Can you please help me with this?

Thanks in advance for any suggestion

Why is “Can someone help me?” not an actual question?

Holden
– Holden

2020-05-02 22:21:11 +00:00
Commented May 2, 2020 at 22:21 — Holden
– Holden, Commented May 2, 2020 at 22:21

anky · Accepted Answer · 2020-05-02 17:49:23Z

2

We can do df.lookup after groupby+cumcount

idx = df.groupby('Column1').cumcount()
df['new']= pd.DataFrame(df['Column2'].tolist()).lookup(df.index,idx)

print(df)
   Column1                   Column2     new
0        1  [(A, B), (C, D), (E, F)]  (A, B)
1        1  [(A, B), (C, D), (E, F)]  (C, D)
2        1  [(A, B), (C, D), (E, F)]  (E, F)
3        2          [(G, H), (I, J)]  (G, H)
4        2          [(G, H), (I, J)]  (I, J)

answered May 2, 2020 at 17:49

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sammywemmy · Accepted Answer · 2020-05-03 12:16:40Z

1

data = {'Column1':["0001"]*3 + ["0002"]*2,
        'Column2':[[('A','B'),('C','D'),('E','F')]]*3 + [[('G','H'),('I','J')]]*2
       }
df = pd.DataFrame(data)

print(df)


    Column1       Column2
0   0001    [(A, B), (C, D), (E, F)]
1   0001    [(A, B), (C, D), (E, F)]
2   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]
4   0002    [(G, H), (I, J)]

M = df.drop_duplicates('Column1')
print(M)

    Column1     Column2
0   0001    [(A, B), (C, D), (E, F)]
3   0002    [(G, H), (I, J)]

pd.concat([df,M.Column2.explode().reset_index(drop=True).rename('new')],axis=1)    

  Column1       Column2                   new
0   0001    [(A, B), (C, D), (E, F)]    (A, B)
1   0001    [(A, B), (C, D), (E, F)]    (C, D)
2   0001    [(A, B), (C, D), (E, F)]    (E, F)
3   0002    [(G, H), (I, J)]            (G, H)
4   0002    [(G, H), (I, J)]            (I, J)

Alternatively, you could use the itertools functions - product and chain to get ur data, and concat back to the original dataframe:

from itertools import product,chain
res = chain.from_iterable(product([first],last)
                          for first, last 
                          in zip(M.Column1, M.Column2))
out = pd.DataFrame(res,columns=['Column1','new'])
pd.concat((df,out.new),axis=1)

edited May 3, 2020 at 12:16

answered May 3, 2020 at 11:59

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

1 Comment

alex8501 Over a year ago

Thank you very much!!! :-) .explode() is exactly what I needed, it worked perfectly.

Collectives™ on Stack Overflow

Adding column to pandas dataframe taking values from list in other column

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related