1

I am looking for a way to create a nested named tuple from a pandas dataframe. The object d is the expected output. I am not sure if the aggregation must be done directly in Pandas then the conversion to NamedTuple should be done afterward?

from typing import NamedTuple
from typing import List
import pandas as pd

if __name__ == "__main__":
    data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "ab 11"]]
    People = pd.DataFrame(data, columns=["Name", "Age", "PostalCode"])

    names = list(People[["Name"]].itertuples(name="Names", index=False))
    postal_codes = list(
        People[["PostalCode"]].itertuples(name="PostalCode", index=False)
    )

    # ...
    # ... The code after produce the expected output even if the name of the NamedTuple doesn't matter

    PeopleName = NamedTuple("PeopleName", [("Name", str)])
    PeoplePC = NamedTuple("PeoplePC", [("PostalCode", str)])
    Demography = NamedTuple(
        "Demography", [("names", List[PeopleName]), ("postalcodes", PeoplePC)]
    )

    d = [
        Demography(
            [PeopleName(Name="tom"), PeopleName(Name="juli")],
            PeoplePC(PostalCode="ab 11"),
        ),
        Demography([PeopleName(Name="nick")], PeoplePC(PostalCode="ab 22"),),
    ]

1 Answer 1

2

You could use groupby and then apply a function (to_nested_tuple) over the groups:

from typing import NamedTuple, List

import pandas as pd

data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "ab 11"]]
people = pd.DataFrame(data, columns=["Name", "Age", "PostalCode"])

PeopleName = NamedTuple("PeopleName", [("Name", str)])
PeoplePC = NamedTuple("PeoplePC", [("PostalCode", str)])
Demography = NamedTuple("Demography", [("names", List[PeopleName]), ("postalcodes", PeoplePC)])


def to_nested_tuple(k, g):
    peoples = list(g['Name'].to_frame().itertuples(name='Person', index=False))
    return Demography(peoples, PeoplePC(k))


d = [to_nested_tuple(*item) for item in people.groupby('PostalCode')]

print(d)

Output

[Demography(names=[Person(Name='tom'), Person(Name='juli')], postalcodes=PeoplePC(PostalCode='ab 11')), Demography(names=[Person(Name='nick')], postalcodes=PeoplePC(PostalCode='ab 22'))]
Sign up to request clarification or add additional context in comments.

2 Comments

This code assume that only one attribute is retrieved from the dataframe, what would be the option to retrieve more than one field, something like ...g[['firstname', 'lastname']].to_frame()... - correct me if I am wrong but this does not produce a Series - Thanks
In case you want more than one field drop the to_frame() call. Does it makes sense?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.