I have an excel sheet(.xlsx file) with the following data:
| Date 1 | Date 2 |
|---|---|
| 03/26/2010 | 3/31/2011 |
| NULL | NULL |
| 03/26/2010 | 3/31/2011 |
| NULL | NULL |
| 03/26/2010 | 3/31/2011 |
| NULL | NULL |
| 01/01/2010 | 6/30/2010 |
| 01/01/2010 | 6/30/2010 |
| 01/12/2011 | 4/15/2012 |
When I convert it to dataframe using
pd.read_excel("file.xlsx",header=0,dtype=str,engine='openpyxl')
It is reading all data properly except for the row items 3,4,5,6 which are being read as below:
| Date 1 | Date 2 |
|---|---|
| 03/26/2010 | 3/31/2011 |
| NULL | NULL |
| 01/01/2010 | 6/30/2010 |
| 01/01/2010 | 6/30/2010 |
| 01/12/2011 | 4/15/2012 |
| NULL | NULL |
It is causing an unnecessary data shift and hence affecting my furthur steps. Any reasons why only at this place it is happening and nowhere else in the data?
pd.read_excel("file.xlsx")pd.ExcelFile("file.xlsx",engine='openpyxl') df = xl.parse("file")