Polars Dataframe from nested dictionaries as columns

Question

I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one element that's the dictionary of the column elements, without "expanding" it into a series.

An example, let's say i have:

d = {'col1': {'0':'A','1':'B','2':'C'}, 'col2': {'0':1,'1':2,'2':3}}

Then, when i do a pl.DataFrame(d) or pl.from_dict(d), i'm getting:

col1           col2
---            ---
struct[3]      struct[3]
{"A","B","C"}  {1,2,3}

Instead of the regular dataframe.

Any idea how to fix this?

Thanks in advance!

Dean MacGregor · Accepted Answer · 2025-03-24 18:29:21Z

2

There's not a particularly straight forward way to do that. You essentially have to take each column one at a time and unpivot it and then join each column back together.

Setup

d = {'col1': {'0':'A','1':'B','2':'C'}, 'col2': {'0':1,'1':2,'2':3}}
df = pl.DataFrame(d)

To (what I think is the) desired output


df_final=None
for col in df.columns:
    df_new = df[col].to_frame().unnest(col)
    df_new = df_new.unpivot(variable_name="index", value_name=col)
    if df_final is None:
        df_final=df_new
    else:
        df_final=df_final.join(df_new, on="index", how="full", coalesce=True)
df_final
shape: (3, 3)
┌───────┬──────┬──────┐
│ index ┆ col1 ┆ col2 │
│ ---   ┆ ---  ┆ ---  │
│ str   ┆ str  ┆ i64  │
╞═══════╪══════╪══════╡
│ 0     ┆ A    ┆ 1    │
│ 1     ┆ B    ┆ 2    │
│ 2     ┆ C    ┆ 3    │
└───────┴──────┴──────┘

Simplified if index keys are guaranteed to be balanced

If you can be assured that the keys of your nested cols will always be uniform and sorted you can do it as a map_batches instead of a for loop with joins.

df.select(pl.all().map_batches(lambda s: (
    s.to_frame().unnest(s.name).unpivot()['value']
)))
shape: (3, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ A    ┆ 1    │
│ B    ┆ 2    │
│ C    ┆ 3    │
└──────┴──────┘

answered Mar 24 at 18:29

Dean MacGregor

20k10 gold badges57 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ouroboros1 Mar 24 at 18:40

+1. If uniform and sorted, OP can use a dict comprehension: pl.DataFrame({k: v.values() for k, v in d.items()}).

Ghost Mar 24 at 18:41

Thanks! So, there is no way to unnest them directly when creating the df? :/

Ghost Mar 24 at 18:54

And yes, sorry, the internal keys in every dict-col will always be the same and sorted (it's basically a df packed and passed through a flask response)

ouroboros1 Mar 24 at 19:05

@Ghost: if you have control over that flow, you might just want to adjust the response. Because that sounds like d is the result of df.to_dict(). Changed to df.to_dict('list') (or df.to_dict('records')) you can pass that to pl.DataFrame without problems.

Dean MacGregor Mar 24 at 19:36

if, in flask, try using get_data to get the raw json and letting polars parse it withpl.read_json(BytesIO(resp.get_data())) instead of using the native python json parser. Should be faster.

|

Collectives™ on Stack Overflow

Polars Dataframe from nested dictionaries as columns

1 Answer 1

Setup

To (what I think is the) desired output

Simplified if index keys are guaranteed to be balanced

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Setup

To (what I think is the) desired output

Simplified if index keys are guaranteed to be balanced

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related