1

Is there a convenient way to implement a function make_dataframe, used as follows

mydict = {
    ('tom', 'gray') : [1,2,3,4,5],
    ('bill', 'ginger') : [6,7,8,9,10],
}

make_dataframe(mydict, tupleLabels=['catname', 'catcolor'], valueLabel='weight')

Expected result

| catname | catcolor | weight |
| tom | gray | 1 |
| tom | gray | 2 |
| tom | gray | 3 |
| tom | gray | 4 |
| tom | gray | 5 |
| bill | ginger | 6 |
| bill | ginger | 7 |
| bill | ginger | 8 |
| bill | ginger | 9 |
| bill | ginger | 10 |

It does not sound too difficult, I just don't want to reinvent the wheel

2 Answers 2

1

You can create your own function using dataframe unstack after renaming the labels using rename_axis:

def make_dataframe(dictionary , tupleLabels , valueLabel):
    return (pd.DataFrame(dictionary).rename_axis(tupleLabels,axis=1)
            .unstack().reset_index(tupleLabels,name=valueLabel))

out = make_dataframe(mydict, tupleLabels=['catname', 'catcolor'], valueLabel='weight')

print(out)

  catname catcolor  weight
0     tom     gray       1
1     tom     gray       2
2     tom     gray       3
3     tom     gray       4
4     tom     gray       5
0    bill   ginger       6
1    bill   ginger       7
2    bill   ginger       8
3    bill   ginger       9
4    bill   ginger      10
Sign up to request clarification or add additional context in comments.

1 Comment

Fancy stuff. Thanks for showing me sth I did not know
1

Your dictionary is misformatted for easy conversion to a Pandas DataFrame.

I suggest doing the following:


mydict = {
    ('tom', 'gray') : [1,2,3,4,5],
    ('bill', 'ginger') : [6,7,8,9,10],
}

l = [ [ k[0], k[1], val ] for k, v in mydict.items() for val in v ]

df = pd.DataFrame(l, columns=['catname', 'catcolor', 'weight'])

Which yields:

  catname catcolor  weight
0     tom     gray       1
1     tom     gray       2
2     tom     gray       3
3     tom     gray       4
4     tom     gray       5
5    bill   ginger       6
6    bill   ginger       7
7    bill   ginger       8
8    bill   ginger       9
9    bill   ginger      10

1 Comment

This most certainly is a possible solution, but it is exactly the thing I was trying to avoid. Looping over values in val can get expensive for larger problems. Hopefully pandas does the loop in C under the hood

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.