1

Hi I have the multiindex pandas dataframe. Sorry for pic, but I found it more explainable rather than plain code

enter image description here

Due to the data inconsistency some of my rows are missing Parent_category. In the sample data Parent_category is empty space.

To get the data frame you see on the picture I grouped my data by Child_category.

How can I fill the missing Parent_category field for the rows with the same Child_category?

The index structure:

MultiIndex(levels=[['Apps', 'Bars', 'Bath', 'Beer', 'Books', 'Breakfast', 'Cellar', 'Charity', 'Cleaning', 'Clothing', 'Co-working', 'Coffee', 'Dining', 'Drugs', 'Education', 'Electronics', 'Entertainment', 'Groceries', 'Hair Cut', 'Hotel', 'Icecream', 'Lunch', 'Maintenance', 'Massage', 'Museums', 'Music', 'Parking', 'Petroleum', 'Rent', 'Repair', 'Resident', 'Snacks', 'Souvenir', 'Souvenirs', 'Spa & yoga', 'Taxi', 'Tea', 'Transport', 'Traveling', 'Visa', 'Yoga', 'Канцелярия'], ['', 'Car', 'Drinks', 'Eatings', 'Home', 'Spa & yoga', 'Transport', 'Traveling', 'Utilities', 'iTunes']],
           codes=[[0, 1, 1, 2, 3, 3, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 37, 38, 39, 40, 41], [9, 0, 2, 4, 0, 2, 0, 0, 3, 0, 8, 0, 1, 0, 0, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 4, 5, 7, 9, 1, 1, 1, 1, 4, 0, 7, 0, 0, 0, 0, 2, 0, 6, 0, 0, 5, 0]],
           names=['Child_category', 'Parent_category'],
           sortorder=0)

After re-indexing I get the following data frame. I guess with O(n^2) it is possible to fill the data within the loop, but looking for elegant solution.

enter image description here

0

1 Answer 1

1

I believe you need:

mux = pd.MultiIndex(levels=[['Apps', 'Bars', 'Bath', 'Beer', 'Books', 'Breakfast', 'Cellar', 'Charity', 'Cleaning', 'Clothing', 'Co-working', 'Coffee', 'Dining', 'Drugs', 'Education', 'Electronics', 'Entertainment', 'Groceries', 'Hair Cut', 'Hotel', 'Icecream', 'Lunch', 'Maintenance', 'Massage', 'Museums', 'Music', 'Parking', 'Petroleum', 'Rent', 'Repair', 'Resident', 'Snacks', 'Souvenir', 'Souvenirs', 'Spa & yoga', 'Taxi', 'Tea', 'Transport', 'Traveling', 'Visa', 'Yoga', 'Канцелярия'], ['', 'Car', 'Drinks', 'Eatings', 'Home', 'Spa & yoga', 'Transport', 'Traveling', 'Utilities', 'iTunes']],
           codes=[[0, 1, 1, 2, 3, 3, 4, 5, 5, 6, 6, 7, 8, 9, 10, 11, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 37, 38, 39, 40, 41], [9, 0, 2, 4, 0, 2, 0, 0, 3, 0, 8, 0, 1, 0, 0, 0, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 4, 5, 7, 9, 1, 1, 1, 1, 4, 0, 7, 0, 0, 0, 0, 2, 0, 6, 0, 0, 5, 0]],
           names=['Child_category', 'Parent_category'],
           sortorder=0)
df = pd.DataFrame({'a': range(52)}, index=mux)

For each Child_category level get first non empty space value:

print (df.rename({'':np.nan}, level=1)
        .reset_index()
        .groupby('Child_category')
        .first()
        .set_index('Parent_category', append=True)
        .head(20))

Or replace empty spaces by values Parent_category per groups by Child_category:

print (df.rename({'':np.nan}, level=1)
        .reset_index()
        .groupby('Child_category')
        .apply(lambda x: x.ffill().bfill())
        .set_index(['Child_category', 'Parent_category'])
        .head(20))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.