0

I currently have a df in pandas with a variable called 'Dates' that records the data an complaint was filed.

data = pd.read_csv("filename.csv") Dates Initially Received 07-MAR-08 08-APR-08 19-MAY-08

As you can see there are missing dates between when complaints are filed, also multiple complaints may have been filed on the same day. Is there a way to fill in the missing days while keeping complaints that were filed on the same day the same?

I tried creating a new df with datetime and merging the dataframes together,

days = pd.date_range(start='01-JAN-2008', end='31-DEC-2017')
df = pd.DataFrame(data=days)
df.index = range(3653)
dates = pd.merge(days, data['Dates'], how='inner')

but I get the following error:

ValueError: can not merge DataFrame with instance of type <class 
'pandas.tseries.index.DatetimeIndex'>

Here are the first four rows of data

data

2
  • Yes, and you can see that the dates go from 08-APR-08 to 19-MAY-08, so the dates between April 8th and May 19th are not in the dataframe. I would like to keep the current entries in 'data' but add the dates that are currently missing as empty rows (there are other variables in the 'data' dataframe). Commented Oct 7, 2018 at 23:39
  • sorry, I had to do a screen shot. there are 27 variables Commented Oct 8, 2018 at 0:04

2 Answers 2

2

You were close, there's an issue with your input

First do:

df = pd.read_csv('filename.csv', skiprows = 1)

Then

days = pd.date_range(start='01-JAN-2008', end='31-DEC-2017')
df_clean = df.reset_index()
df_clean['idx dates'] = pd.to_datetime(df_clean['Initially Received'])
df2 = pd.DataFrame(data=days, index = range(3653), columns=['full dates'])
dates = pd.merge(df2, df_clean, left_on='full dates', right_on = 'idx dates', how='left')
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, however, though all the dates are filled in I lose all the other data from the dataframe 'data'...
hmm, that makes no sense. Try removing the fill_value parameter
I think you may also run into ValueError: cannot reindex from a duplicate axis since 'multiple complaints may have been filed on the same day.'
I took about the fill_value parameter but now the cells are filled with NaN instead of zeros.
I am not sure what you mean about merging the data.
|
0

Create your date range, and use merge to outer join it to the original dataframe, preserving duplicates.

import pandas as pd
from io import StringIO

TESTDATA = StringIO(
"""Dates;fruit
05-APR-08;apple
08-APR-08;banana
08-APR-08;pear
11-APR-08;grapefruit
""")

df = pd.read_csv(TESTDATA, sep=';', parse_dates=['Dates'])

dates = pd.date_range(start='04-APR-2008', end='12-APR-2008').to_frame()
pd.merge(
    df, dates, left_on='Dates', right_on=0,
    how='outer').sort_values(by=['Dates']).drop(columns=0)

#   Dates       fruit
#   2008-04-04  NaN
#   2008-04-05  apple
#   2008-04-06  NaN
#   2008-04-07  NaN
#   2008-04-08  banana
#   2008-04-08  pear
#   2008-04-09  NaN
#   2008-04-10  NaN
#   2008-04-11  grapefruit
#   2008-04-12  NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.