1

I was surfing some code on the internet for creating dummies to my date column, which has only three values: 1800, 1900, 2000

The 'yr' is inside the function during its defining stage and has not been declared earlier. The 'yr' seems to occur in 'for loop' and 'apply' is used afterwards to get dummies. I understand that the 'yr' list in the for loop actually generates three columns of 1800, 1900, 2000 in 'movies' dataframe.

But then does;

1.) python allow declaring a list 'yr' in for loop without its previous initialization?

2.) and how come the column 'date' of 'movies' df is passed to the function without passing 'yr' as i am not able to comprehend what the 'if' statement inside the function is comparing each value of column 'date' with?

I am unable to comprehend the flow of code here for 'yr' from for loop to inside the function where 'date' column value 'val' gets compared in 'if' statement.

Please help !!

# Return century of movie as a dummy column
def add_movie_year(val):
    if val[:2] == yr:
        return 1
    else:
        return 0

# Apply function
for yr in ['18', '19', '20']:
    movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year)
2
  • 1
    Are you using pandas? Commented Oct 29, 2019 at 17:33
  • Yes I am using pandas. Does it influence anything here? :( Commented Oct 29, 2019 at 17:35

2 Answers 2

1

The reason you are having this problem is you should put yr in your add_movie_year function and tell apply function to use the yr as a function input.

movies = pd.DataFrame({'date':['1800', '1900', '2000']})
# Return century of movie as a dummy column
def add_movie_year(val, yr):
    if val[:2] == yr:
        return 1
    else:
        return 0

# Apply function
for yr in ['18', '19', '20']:
    movies[str(yr) + "00's"] = movies['date'].apply(add_movie_year, args = (yr,))
Sign up to request clarification or add additional context in comments.

4 Comments

Yes this is more readable and reliable. That is what exactly I was thinking. But I really want to know how does even my searched code in question here work. I wanna know the flow of 'yr' in my code.
My no 1.) question is that are we allowed to declare or initialize list in for loop directly without its previous occurrence in python; like this example here..
When the python invokes the function, it first looks up the local variable in the function, and if the local variable is not found, it then looks up the global variables, so your code works. This sometimes will cause problems in debugging, and it is suggested not to use the same input name for function argument and global variables.
So to cut down on all; 'for yr in ['18', '19', '20']:' This line in my code initializes 'yr' that is used inside the function 'add_movie_yr' in MY CODE. right?
1

yr can be used in the function body because by the time the function is actually invoked, yr has been initialized and so the function successfully manages to look it up. Functions are able to use variables outside their scope (this is necessary to be able to use imports), but it's generally bad practice to do so.

3 Comments

But my first question here is that does the python allow such declaration of list in for loop like here without even initializing it as we do in java for example?
DataFrame.apply accepts functions with any number of arguments.
Thanks, updated my answer. @KaustubhUrsekar the answer to your question is yes, python does allow it, but it's a bad idea to rely on it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.