9

I have a pandas DataFrame with a two-level multiindex. The second level is numeric and supposed to be sorted and sequential for each unique value of the first-level index, but has gaps. How do I insert the "missing" rows? Sample input:

import pandas as pd
df = pd.DataFrame(list(range(5)),
                  index=pd.MultiIndex.from_tuples([('A',1), ('A',3),
                                                   ('B',2), ('B',3), ('B',6)]),
                  columns='value')
#     value
#A 1      0
#  3      1
#B 2      2
#  3      3
#  6      4

Expected output:

#     value
#A 1      0
#  2    NaN
#  3      1
#B 2      2
#  3      3
#  4    NaN
#  5    NaN
#  6      4

I suspect I could have used resample, but I am having trouble converting the numbers to anything date-like.

2
  • 1
    Create the index by for loop , then reindex it ? Commented Jan 30, 2019 at 21:57
  • @Wen-Ben Thanks! That would be my last-resort option. I hate using loops in pandas. Commented Jan 30, 2019 at 22:00

5 Answers 5

2

If there is a will, there is a way. I am not proud of this but, I think it works.

Try:

def f(x):
    levels = x.index.remove_unused_levels().levels
    x = x.reindex(pd.MultiIndex.from_product([levels[0], np.arange(levels[1][0], levels[1][-1]+1)]))
    return x

df.groupby(level=0, as_index=False, group_keys=False).apply(f)

Output:

     value
A 1    0.0
  2    NaN
  3    1.0
B 2    2.0
  3    3.0
  4    NaN
  5    NaN
  6    4.0
Sign up to request clarification or add additional context in comments.

Comments

2

After much deliberations, I was able to come up with a solution myself. Judging by the fact of how lousy it is, the problem I am facing is not a very typical one.

new_index = d.index.to_frame()\
                .groupby(0)[1]\
                .apply(lambda x:
                         pd.Series(1, index=range(x.min(), x.max() + 1))).index
d.reindex(new_index)

Comments

2

You can simply use the following depends on the missing index:

result.unstack(1).stack(0, dropna=False).fillna(0)

When you unstack, the pandas expand the df to have rows and columns and in the above example, level 1 index is going to be the column names. Then, again by stacking, you return the df to its original form, BUT, this time you need to make sure you use dropna=False so the NaN values are going to be there for missing indexes. In the end, using .fillna(0) is optional depends on what you want to do with the NaN values.

1 Comment

This will also create (B, 1) which is not in the desired dataframe. It also doesn't insert the new values (B, 4) and (B, 5).
1

Little late for the party i see, but for future travelers, I think I found a solution:

Use pandas multiindex from product function to generate all combinations of levels of indices:

df_new_index = pd.MultiIndex.from_product([
  df.index.get_level_values(0).unique(),
  df.index.get_level_values(1).unique()])

df_reindexed = df.reindex(df_new_index)

Comments

0

There's no accounting for taste, but I think falling back to list comprehension leads to slightly more readable code:

df.reindex(
    pd.MultiIndex.from_tuples([
        (level_0, level_1)
        for level_0 in df.reset_index(0).level_0.unique()
        for level_1 in range(
            df.reset_index(1).loc[level_0, "level_1"].min(),
            df.reset_index(1).loc[level_0, "level_1"].max()+1
        )
]))

# Output:
#value
#A  1   0.0
#   2   NaN
#   3   1.0
#B  2   2.0
#   3   3.0
#   4   NaN
#   5   NaN
#   6   4.0

Although this is of course slower than going down the apply route:

list-comprehension: 2.57 ms ± 19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
DYZ apply: 1.25 ms ± 8.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Scott's apply: 2.19 ms ± 9.84 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.