1

I checked many questions from here but it is not totally the same as my problem.

Let's create a dummy dictionary to describe my problem.

dictionary = {12: {1,2,4,6,8,12,16,65,13,644,653,23}, 15:{10,20,30,23,56,6,8,}, 17:{4,7,11,12,19}, 20:{40,54,123,545,234}}

Here the keys are userid, values are location-id.

My goal is to create a dataframe like this

userid locationid
12        1
12        2
12        4
...       ...
15        20
15        30
15        23
...       ...
17         4
17         7
17         11
...        ...
20         40
20         54
...       ...

My solution

for dictkey in range(len(dictionary.keys())): 
        lids = list(np.array(list(dictionary.values())[dictkey]).item())
        userid = np.array(list(dictionary.keys())[dictkey])
        userid = userid.reshape(1,1)
        df= pd.DataFrame(userid, columns =['userid'])
        df['locationid'] = lids  

but it doesn't work. How should I approach the problem? I could not solve

Note: Normally my real dataset is big.

3 Answers 3

5

you can convert to series then explode:

pd.Series(dictionary).map(list).explode()

12      1
12      2
12     65
12      4
12    644
12      6
12      8
12     12
12     13
12    653
12     16
12     23
15      6
15      8
15     10
15     20
15     23
15     56
15     30
17      4
17      7
17     11
17     12
17     19
20    545
20     40
20    234
20     54
20    123
dtype: object

Or for higher versions of pandas >= 1.2.0, one can also use (thanks @aneroid)

pd.Series(dictionary).explode()
Sign up to request clarification or add additional context in comments.

8 Comments

very nice solution
@anky FWIW, pd.Series(dictionary).explode() directly also works.
@aneroid thank you :) I was wondering why didnt they implement it. For my version 0.25.1 it doesnot work. But glad to know they have implemented explode on the values. BDW what id your pandas version?
@anky That would explain it; I'm on v1.2.1. I think exploding Sets was added in 1.2.0. The docs for 0.25.1 and 0.25.3 don't mention Sets.
@aneroid thank you for your research :) do you mind if I add your suggestion in the solution?
|
3

You can use pd.concat and pd.DataFrame.stack

>>> pd.concat([pd.Series(list(val), name=k) for k, val in dictionary.items()], 
               axis=1
              ).stack().reset_index(level=0, drop=True).sort_index()
               .rename_axis('uderId').to_frame('locationid')

        locationid
uderId            
12            65.0
12           653.0
12            13.0
12            12.0
12             8.0
12             6.0
12           644.0
12            16.0
12             4.0
12            23.0
12             2.0
12             1.0
15            56.0
15            23.0
15            30.0
15             8.0
15            10.0
15             6.0
15            20.0
17             7.0
17            19.0
17            11.0
17             4.0
17            12.0
20           234.0
20           545.0
20            54.0
20            40.0
20           123.0

Comments

3

You can convert the dictionary 12: {1,2,3} to [(12,1),(12,2),(12,3)] using itertools.product and then finally create the dataframe

import itertools
data = []
for k,v in dictionary.items():
    data.extend(list(itertools.product([k],v)))

df = pd.DataFrame(data, columns=['userid', 'locationid'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.