1

I have a csv file which looks like below

327,2018-02-12 23:30:18.255810+00:00,Pur,10.11.100.1,WSE,8.0,23.0,6.5,0.0,,,,,,,,
328,2018-02-12 23:30:22.718605+00:00,Bol,10.11.100.1,DEF,8.0,23.0,11.41,0.0,,,,,,,,
333,2018-02-13 00:00:17.886487+00:00,Cal,10.11.100.1,WSE,9.0,23.0,10.5,0.0,,,,,,,,
334,2018-02-13 00:00:21.948083+00:00,Moe,10.11.100.1,CFG,9.0,23.0,21.5,0.0,,,,,,,,
436,2018-02-15 11:00:11.137740+00:00,Cad,10.11.100.1,MOD,5.0,24.0,3.17,0.0,,,,,,,,
437,2018-02-15 11:27:20.994247+00:00,Ric,10.11.100.1,DEF,7.0,24.0,9.5,0.0,,,,,,,,
877,2018-02-17 01:34:10.662735+00:00,Pit,10.4.100.1,CFD,6.0,3.0,37.23,0.0,,,,,,,,
878,2018-02-20 00:04:39.855528+00:00,Bol,10.4.100.1,WSE,9.0,3.0,55.42,0.0,,,,,,,,

The date range is 2018-02-02 to 2018-04-13

I have tried doing something like this as mentioned here Pandas Reindex to Fill Missing Dates, or Better Method to Fill?

df = pd.read_csv(file, parse_dates=["date"])
df.set_index("date", inplace=True)
df.index = pd.to_datetime(df.index,format='%Y-%m-%d %H:%M:%S.%f')
d2 = pd.DataFrame(index=pd.date_range('2018-02-02','2018-04-13'))
print(df.join(d2,how='right'))

But this doesn't seem to work.I still have some missing dates.What is the correct way to fill up the missing dates and assign 0 to its associated values?

5
  • You have missing date , but the target df index format is '%Y-%m-%d %H:%M:%S.%f'(datetime)? Commented Apr 19, 2018 at 2:34
  • The dates are unevenly spaced and given with a precision of microseconds. Which dates exactly are missing here? Can you post the expected date index for your data snippet? Commented Apr 19, 2018 at 6:44
  • @WolfgangK 2018-02-14, 2018-02-16, 2018-02-18 and 2018-02-19 as shown in the data snippet Commented Apr 19, 2018 at 6:47
  • @SouvikRay For example we have 2018-02-15 11:27:20.994247+00:00, so what should be the exact format for the following day 2018-02-16? Or can we just discard everything from hours downwards? Commented Apr 19, 2018 at 7:21
  • @WolfgangK discard everything from hour and downwards Commented Apr 19, 2018 at 7:22

1 Answer 1

1

You only need to convert your index to plain dates to make your own solution work:

df = pd.read_csv(file, parse_dates=['date'])
df.set_index('date', inplace=True)

df.index = df.index.date

d2 = pd.DataFrame(index=pd.date_range('2018-02-12','2018-02-20'))
print(df.join(d2, how='right').fillna(0))

It should give

2018-02-12  327.0  Pur  10.11.100.1  WSE  8.0  23.0   6.50  0.0
2018-02-12  328.0  Bol  10.11.100.1  DEF  8.0  23.0  11.41  0.0
2018-02-13  333.0  Cal  10.11.100.1  WSE  9.0  23.0  10.50  0.0
2018-02-13  334.0  Moe  10.11.100.1  CFG  9.0  23.0  21.50  0.0
2018-02-14    0.0    0            0    0  0.0   0.0   0.00  0.0
2018-02-15  436.0  Cad  10.11.100.1  MOD  5.0  24.0   3.17  0.0
2018-02-15  437.0  Ric  10.11.100.1  DEF  7.0  24.0   9.50  0.0
2018-02-16    0.0    0            0    0  0.0   0.0   0.00  0.0
2018-02-17  877.0  Pit   10.4.100.1  CFD  6.0   3.0  37.23  0.0
2018-02-18    0.0    0            0    0  0.0   0.0   0.00  0.0
2018-02-19    0.0    0            0    0  0.0   0.0   0.00  0.0
2018-02-20  878.0  Bol   10.4.100.1  WSE  9.0   3.0  55.42  0.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.