2

I have some data that looks a little like this:

    data=[([('thing1',
    'thing1a'),
   ('thing1',
    'thing1b'),
   ('thing1',
    'thing1c'),
   ('thing1',
    'thing1d'),
   ('thing1',
    'thing1e')],
  'thing1description'),
 ([('thing2',
    'thing2a')],
  'thing2description'),
 ([('thing3',
 'thing3a')],
 'thing3description')]

I would like to build a dataframe that looks like this:

thing_number    thing_letter    description
thing1            thing1a   thing1description
thing1            thing1b   thing1description
thing1            thing1c   thing1description
thing1            thing1d   thing1description
thing1            thing1e   thing1description
thing2            thing2a   thing2description
thing3            thing3a   thing3description

thanks to a previous very similar question such as this I can achieve it using the below but I think I must be missing something to make this more elegant:

data_=pd.DataFrame(data,columns=['thing','description'])
data_=data_.explode('thing')
data_=pd.concat([data_,pd.DataFrame([(*i, k) for k,j in data for i in k], columns=['thing_number','thing_letter','all'],index=data_.index)],axis=1)
data_=data_[['thing_number','thing_letter','description']]

To summarise I am looking for a more efficient and elegant way to unnest the list of tuples. Thanks in advance.

3 Answers 3

1

A shorter code based on the same approach:

df = (pd.DataFrame(data, columns=['thing','description'])
        .explode('thing',
                 ignore_index=True) # optional
       )

df[['thing_number','thing_letter']] = df.pop('thing').tolist()

Output:

         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a
Sign up to request clarification or add additional context in comments.

Comments

1

Another way using dict.fromkeys:

data2 = [dict.fromkeys(ks, v) for ks, v in data]
df = pd.concat([pd.Series(d) for d in data2]).reset_index()
df.columns = ['thing_number','thing_letter','description']

Output:

  thing_number thing_letter        description
0       thing1      thing1a  thing1description
1       thing1      thing1b  thing1description
2       thing1      thing1c  thing1description
3       thing1      thing1d  thing1description
4       thing1      thing1e  thing1description
5       thing2      thing2a  thing2description
6       thing3      thing3a  thing3description

Comments

0

Another option, with pd.concat:

out = {key: pd.DataFrame(value) for value, key in data}
(pd
.concat(out, names = ['description', None])
.set_axis(['thing_number', 'thing_letter'], axis = 1)
.droplevel(1)
.reset_index()
)
         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.