I currently have a date column that has some issues. I have attempted to fix the problem but cannot come to a conclusion.
Here is the data:
# Import data
df_views = pd.read_excel('PageViews.xlsx')
# Check data types
df_views.dtypes
Out[57]:
Date object
Customer ID int64
dtype: object
The date column is not in a 'datetime' data format as expected. Further inspection yields:
df_views.ix[:5]
Date Customer ID
0 01/25/2016 104064596300
1 02/28/2015 102077474472
2 11/17/2016 106430081724
3 02/24/2016 107770391692
4 10/05/2016 106523680888
5 02/24/2016 107057691592
I quickly check which rows does not follow the proper format xx/xx/xxxx
print (df_views[df_views["Date"].str.len() != 10])
Date Customer ID
189513 12/14/ 106285770688
189514 10/28/ 107520462840
189515 11/01/ 102969804360
189516 11/10/ 102106417100
189517 02/16/ 107810168068
189518 10/25/ 102096164504
189519 02/08/ 107391760644
189520 02/29/ 107353558928
189521 10/24/ 107209142140
189522 12/20/ 107875461336
189523 12/23/ 107736375428
189524 11/12/ 106561080372
189525 01/27/ 102676548120
189526 11/19/ 107733043896
189527 12/31/ 107774452412
189528 01/21/ 102610956040
189529 01/09/ 108052836888
189530 02/21/ 106380330112
189531 02/02/ 107844459772
189532 12/12/ 102006641640
189533 12/16/ 106604647688
189534 11/14/ 102383102504
I have attempted to create a for loop but cannot figure out how to approach my loop.
Important note: I know that the time period for all observations is between September 2015 through February 2016.
So if the month is 09/10/11/12 - then I can add "2015" to the date, otherwise if the month is 01/02, I can add "2016".
for row in df_views["Date"]:
if len(row) != 10:
if row.str.contains("^09|10|11|12\/"):
row.str.cat("2015")
elif row.str.contains("^01|02\/"):
row.str.cat("2016")
else:
continue
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-87-684e121dd62d> in <module>()
5 for row in df_views["Date"]:
6 if len(row) != 10:
----> 7 if row.str.contains("^09|10|11|12\/"):
8 row.str.cat("2015")
9 elif row.str.contains("^01|02\/"):
AttributeError: 'str' object has no attribute 'str'
rowseems to be a string. So all the methods you're trying (contains, cat) are very unlikely to work..stroperations you're using on the Series directly. You may need to tweak them but it will be faster than explicit iterating.