I was surfing some code on the internet for creating dummies to my date column, which has only three values: 1800, 1900, 2000
The 'yr' is inside the function during its defining stage and has not been declared earlier. The 'yr' seems to occur in 'for loop' and 'apply' is used afterwards to get dummies. I understand that the 'yr' list in the for loop actually generates three columns of 1800, 1900, 2000 in 'movies' dataframe.
But then does;
1.) python allow declaring a list 'yr' in for loop without its previous initialization?
2.) and how come the column 'date' of 'movies' df is passed to the function without passing 'yr' as i am not able to comprehend what the 'if' statement inside the function is comparing each value of column 'date' with?
I am unable to comprehend the flow of code here for 'yr' from for loop to inside the function where 'date' column value 'val' gets compared in 'if' statement.
Please help !!
# Return century of movie as a dummy column
def add_movie_year(val):
if val[:2] == yr:
return 1
else:
return 0
# Apply function
for yr in ['18', '19', '20']:
movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year)