Pandas Dataframe from nested tuples

Question

I have some data that looks a little like this:

    data=[([('thing1',
    'thing1a'),
   ('thing1',
    'thing1b'),
   ('thing1',
    'thing1c'),
   ('thing1',
    'thing1d'),
   ('thing1',
    'thing1e')],
  'thing1description'),
 ([('thing2',
    'thing2a')],
  'thing2description'),
 ([('thing3',
 'thing3a')],
 'thing3description')]

I would like to build a dataframe that looks like this:

thing_number    thing_letter    description
thing1            thing1a   thing1description
thing1            thing1b   thing1description
thing1            thing1c   thing1description
thing1            thing1d   thing1description
thing1            thing1e   thing1description
thing2            thing2a   thing2description
thing3            thing3a   thing3description

thanks to a previous very similar question such as this I can achieve it using the below but I think I must be missing something to make this more elegant:

data_=pd.DataFrame(data,columns=['thing','description'])
data_=data_.explode('thing')
data_=pd.concat([data_,pd.DataFrame([(*i, k) for k,j in data for i in k], columns=['thing_number','thing_letter','all'],index=data_.index)],axis=1)
data_=data_[['thing_number','thing_letter','description']]

To summarise I am looking for a more efficient and elegant way to unnest the list of tuples. Thanks in advance.

mozway · Accepted Answer · 2022-10-05 07:25:56Z

1

A shorter code based on the same approach:

df = (pd.DataFrame(data, columns=['thing','description'])
        .explode('thing',
                 ignore_index=True) # optional
       )

df[['thing_number','thing_letter']] = df.pop('thing').tolist()

Output:

         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a

answered Oct 5, 2022 at 7:25

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chris · Accepted Answer · 2022-10-05 07:41:55Z

1

Another way using dict.fromkeys:

data2 = [dict.fromkeys(ks, v) for ks, v in data]
df = pd.concat([pd.Series(d) for d in data2]).reset_index()
df.columns = ['thing_number','thing_letter','description']

Output:

  thing_number thing_letter        description
0       thing1      thing1a  thing1description
1       thing1      thing1b  thing1description
2       thing1      thing1c  thing1description
3       thing1      thing1d  thing1description
4       thing1      thing1e  thing1description
5       thing2      thing2a  thing2description
6       thing3      thing3a  thing3description

answered Oct 5, 2022 at 7:41

Chris

29.8k3 gold badges34 silver badges56 bronze badges

Comments

sammywemmy · Accepted Answer · 2022-10-05 09:18:02Z

0

Another option, with pd.concat:

out = {key: pd.DataFrame(value) for value, key in data}
(pd
.concat(out, names = ['description', None])
.set_axis(['thing_number', 'thing_letter'], axis = 1)
.droplevel(1)
.reset_index()
)
         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a

answered Oct 5, 2022 at 9:18

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe from nested tuples

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related