Converting tuples into rows from numerous columns in a pandas DataFrame

Question

I've got a dictionary that look like this:

data = {'function_name': ['func1', 'func2', 'func3'],
        'argument': [('func1_arg1', 'func1_arg2'), 
                     ('func2_arg1',), 
                     ('func3_arg1', 'func3_arg2', 'func3_arg3')],
        'A': ['value_a1', 'value_a2', 'value_a3'],
        'B': 'b',
        'types': [('func1_type1', 'func1_type2'), 
                  ('func2_type1',),
                  ('func3_type1', 'func3_type2', 'func3_type3')]}

I'd like to convert it into a pandas DataFrame and make it look like this:

function_name    argument    types         A          B

func1            func1_arg1  func1_type1   value_a1   b
func1            func1_arg2  func1_type2   value_a1   b
func2            func2_arg1  func2_type1   value_a2   b
func3            func3_arg1  func3_type1   value_a3   b
func3            func3_arg2  func3_type2   value_a3   b
func3            func3_arg3  func3_type3   value_a3   b

As it follows from here if there would be one column of tuples, I would have to do this:

import pandas as pd


data_frame = pd.DataFrame(data)
new_frame = data_frame.set_index(['function_name','A','B'])['argument'].apply(pd.Series).stack().to_frame('argument').reset_index().drop('level_3',1)

But how do I go about it if I've got a few columns of tupples?

EDIT:

There seems to be a little problem with the approved solution. Namely, if there's a tuppled column consisting entirely of Nones or just empty tuples then in the process of forming the new_frame they get dropped. Is it possible to make pandas avoid dropping the columns.

The initial data looks like this:

data = {'function_name': ['func1', 'func2', 'func3'],
        'argument': [('func1_arg1', 'func1_arg2'), 
                     ('func2_arg1',), 
                     ('func3_arg1', 'func3_arg2', 'func3_arg3')],
        'A': ['value_a1', 'value_a2', 'value_a3'],
        'B': 'b',
        'types': [('func1_type1', 'func1_type2'), 
                  ('func2_type1',),
                  ('func3_type1', 'func3_type2', 'func3_type3')],
        'info': [(None, None), (None,), (None, None, None)]}

The 'info' columns could be [(), (), ()], the outcome would still be the same.

Bharath M Shetty · Accepted Answer · 2017-09-04 12:45:59Z

3

Since there are multiple columns to expand I dont think this can be in single line but you can use apply with pd.DataFrame constructor. The default value of dropna for stack method is True so set it to false to keep the None values. i.e

index = ['function_name','A','B']
new_frame = data_frame.set_index(index)
            .apply(lambda x:pd.DataFrame(x.values.tolist()).stack(dropna=False),1)
            .stack(dropna=False).reset_index().drop('level_3',1)
new_frame.columns = index + [x for x in data_frame.columns if x not in index]

   function_name A        B    argument         types
0  func1  value_a1        b    func1_arg1  func1_type1
1  func1  value_a1        b    func1_arg2  func1_type2
2  func2  value_a2        b    func2_arg1  func2_type1
3  func3  value_a3        b    func3_arg1  func3_type1
4  func3  value_a3        b    func3_arg2  func3_type2
5  func3  value_a3        b    func3_arg3  func3_type3

With three columns to expand

data = {'function_name': ['func1', 'func2', 'func3'],
    'argument': [('func1_arg1', 'func1_arg2'), 
                 ('func2_arg1',), 
                 ('func3_arg1', 'func3_arg2', 'func3_arg3')],
    'A': ['value_a1', 'value_a2', 'value_a3'],
    'B': 'b',
    'types': [('func1_type1', 'func1_type2'), 
              ('func2_type1',),
              ('func3_type1', 'func3_type2', 'func3_type3')],
    'info': [(None, None), (None,), (None, None, None)]}

  function_name         A  B    argument  info        types
0         func1  value_a1  b  func1_arg1  None  func1_type1
1         func1  value_a1  b  func1_arg2  None  func1_type2
2         func2  value_a2  b  func2_arg1  None  func2_type1
3         func3  value_a3  b  func3_arg1  None  func3_type1
4         func3  value_a3  b  func3_arg2  None  func3_type2
5         func3  value_a3  b  func3_arg3  None  func3_type3

Hope it helps.

edited Sep 4, 2017 at 12:45

answered Sep 3, 2017 at 11:37

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

BigBear Over a year ago

Yep, seems like it works like charm! Thank you ever so much for your help!

BigBear Over a year ago

I've just come across an issue with the solution. If one of the tuppled columns consists entirely of Nones it gets dropped in the process of forming new_frame and the second line errors out with "Length mismatch: expected axis has n elements, new values have n + k elements", where k is the number of dropped (noned) columns. I tried resolving it but couldn't do that. Is it possible to avoid dropping the columns of they consists entirely of Nones?

Bharath M Shetty Over a year ago

Can you update the data dict with the following case ?

BigBear Over a year ago

Done! I thought it would be better to write it below my initial question cos other people answered it.

Bharath M Shetty Over a year ago

@BigBear the default value of dropna for stack method is true so set it to false. Hope it helps

|

Parfait · Accepted Answer · 2017-09-03 15:15:16Z

2

Consider a nested list and dict comprehensions if all items are equal length (i.e., 3) using the DataFrame constructor. Only challenge is the scalar item 'B':'b' which can be assigned at end if known in advance:

dfs = [pd.DataFrame([{k:v[i] for k,v in data.items() if len(data[k])>1}][0]) \
             for i in range(len(data['function_name']))]

df = pd.concat(dfs).reset_index(drop=True).assign(B='b') 

print(df)
#           A    argument function_name        types  B
# 0  value_a1  func1_arg1         func1  func1_type1  b
# 1  value_a1  func1_arg2         func1  func1_type2  b
# 2  value_a2  func2_arg1         func2  func2_type1  b
# 3  value_a3  func3_arg1         func3  func3_type1  b
# 4  value_a3  func3_arg2         func3  func3_type2  b
# 5  value_a3  func3_arg3         func3  func3_type3  b

answered Sep 3, 2017 at 15:15

Parfait

108k19 gold badges103 silver badges138 bronze badges

1 Comment

Bharath M Shetty Over a year ago

Can you try your solution with the three columns to be expanded? data i provided in my solution. Your solution demands the types column to be of equal length.

Collectives™ on Stack Overflow

Converting tuples into rows from numerous columns in a pandas DataFrame

2 Answers 2

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related