2

I have a yearly information (COUNT) of countries stored in DataFrame. However, some countries are missing in certain years.

If I have a complete list of countries, what is an optimal way to add them under corresponding years and fill the missing value for COUNT with 0?

            DATE    COUNTRY     COUNTRY_ID  COUNT
       0    1980    United States   840     42
      42    1980    Czech Republic  203     2
      95    1980    Hungary         348     1
      96    1980    Great Britain   826     1
      97    1980    South Africa    710     1
      98    1982    United States   840     42
     140    1982    Paraguay        600     2
       .
       .

2 Answers 2

1

One way to do this is to make a combination of all the DATE, COUNTRY combinations and then reindex the DataFrame and finally fill in the missing values.

# Assume that we want all years not just the ones seen
years = range(df['DATE'].min(), df['DATE'].max()+1)

# get all combinations
idx = pd.MultiIndex.from_product([years, df['COUNTRY'].unique()], names=['DATE', 'COUNTRY'])

# reindex by first putting DATE and COUNTRY into the index
df1 = df.set_index(['DATE', 'COUNTRY']).reindex(idx).reset_index()

# Fill back in missing IDs
country_map = df.set_index('COUNTRY')['COUNTRY_ID'].drop_duplicates()
df1['COUNTRY_ID'] = df1.COUNTRY.map(country_map)

# fill in 0 for COUNT and convert back to int
df1['COUNT'] = df1['COUNT'].fillna(0).astype(int)

    DATE         COUNTRY  COUNTRY_ID  COUNT
0   1980   United States         840     42
1   1980  Czech Republic         203      2
2   1980         Hungary         348      1
3   1980   Great Britain         826      1
4   1980    South Africa         710      1
5   1980        Paraguay         600      0
6   1981   United States         840      0
7   1981  Czech Republic         203      0
8   1981         Hungary         348      0
9   1981   Great Britain         826      0
10  1981    South Africa         710      0
11  1981        Paraguay         600      0
12  1982   United States         840     42
13  1982  Czech Republic         203      0
14  1982         Hungary         348      0
15  1982   Great Britain         826      0
16  1982    South Africa         710      0
17  1982        Paraguay         600      2
Sign up to request clarification or add additional context in comments.

Comments

0

Consider also a cross join merge route (for those of us with the SQL mindset)

# ASSIGN KEY COLUMN
df['KEY'] = 1

# CREATE DF OF DATES RANGE
dates = pd.DataFrame({'DATE':list(range(df['DATE'].min(),df['DATE'].max() + 1)),
                      'COUNT':0, 'KEY':1})    
# CROSS JOIN MERGE
mdf = df.merge(dates, on=['KEY'])

# REASSIGN COUNT
mdf.loc[mdf['DATE_x'] != mdf['DATE_y'], 'COUNT_x'] = 0

# CLEAN UP DF (COLS AND ROWS)
mdf = mdf[['DATE_y', 'COUNTRY', 'COUNTRY_ID', 'COUNT_x']]\
           .rename(columns={'DATE_y':'DATE', 'COUNT_x':'COUNT'})\
           .drop_duplicates(['DATE', 'COUNTRY', 'COUNTRY_ID'])\
           .sort_values('DATE')\
           .reset_index(drop=True)

#     DATE         COUNTRY  COUNTRY_ID  COUNT
# 0   1980   United States         840     42
# 1   1980        Paraguay         600      0
# 2   1980  Czech Republic         203      2
# 3   1980         Hungary         348      1
# 4   1980   Great Britain         826      1
# 5   1980    South Africa         710      1
# 6   1981   United States         840      0
# 7   1981  Czech Republic         203      0
# 8   1981         Hungary         348      0
# 9   1981        Paraguay         600      0
# 10  1981   Great Britain         826      0
# 11  1981    South Africa         710      0
# 12  1982    South Africa         710      0
# 13  1982         Hungary         348      0
# 14  1982  Czech Republic         203      0
# 15  1982   United States         840      0
# 16  1982   Great Britain         826      0
# 17  1982        Paraguay         600      2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.