0

I have a data frame :

df = pd.DataFrame(rows,columns=['proid','sku', 'qty'])

and a list of unique skus

skus = ["SKU1", "SKU2",  "SKU3"]

Now df may not contain rows for all combinations proid and sku where sku are from uniq list skus

e.g.:

#    proid  sku   qty
# 1  p1     SKU1   1
# 2  p1     SKU3   2
# 3  p2     SKU1   3

I want to add rows to data frame in such a way that all proid sku combinations exist with default 0 values

result:

#    proid  sku   qty
# 1  p1     SKU1   1
# 2  p1     SKU3   2
# 3  p2     SKU1   3
# 4  p1     SKU2   0
# 5  p2     SKU2   0
# 6  p2     SKU3   0

3 Answers 3

3

You can use itertools.product and concat

setup

z = io.StringIO("""    proid  sku   qty
 1  p1     SKU1   1
 2  p1     SKU3   2
 3  p2     SKU1   3""")

df = pd.read_table(z, delim_whitespace=True)
p = ["p1", "p2"]
s = ["SKU1", "SKU2", "SKU3"]
df2 = pd.DataFrame(list(it.product(p,s)), columns=["proid", "sku"])

Then

concat

df = df.set_index(["proid", "sku"])
df2 = df2.set_index(["proid", "sku"])

pd.concat([df2[~df2.index.isin(df.index)],df]).reset_index()

    proid   sku     qty
0   p1      SKU2    0
1   p2      SKU2    0
2   p2      SKU3    0
3   p1      SKU1    1
4   p1      SKU3    2
5   p2      SKU1    3
Sign up to request clarification or add additional context in comments.

Comments

1

Another answer that works well :

from itertools import product

combs = pd.DataFrame(list(product(df[proid], df[sku])),
                 columns=["proid", "sku"])
result = df.merge(combs, how = 'right').fillna(0).drop_duplicates()

Comments

0

create a dataframe from a multiindex of all possible indexes

ind = pd.MultiIndex.from_product(
      [['p1', 'p2'], ['SKU1', 'SKU2' ,'SKU3']]
).to_frame().reset_index(drop=True).rename({0:'proid', 1: 'sku'}, axis=1)

left join to the original dataframe

ind.merge(df, on=['proid', 'sku'], how='left').fillna(0)

outputs:

  proid   sku  qty
0    p1  SKU1  1.0
1    p1  SKU2  0.0
2    p1  SKU3  2.0
3    p2  SKU1  3.0
4    p2  SKU2  0.0
5    p2  SKU3  0.0

alternatively, create a multiindex & reindex the original dataframe, filling nulls with 0

ind2 = pd.MultiIndex.from_product(names=['proid', 'sku'], 
    iterables=[['p1', 'p2'], ['SKU1', 'SKU2' ,'SKU3']])
df.set_index(['proid', 'sku']).reindex(ind2, fill_value=0).reset_index()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.