2

df = rocksfile snapshot of my dataframe

Question: Write a function that will take a row of a DataFrame and print out the song, artist, and whether or not the release date is < 1970.

Defining my function:

def release_info(row):
    """Checks if song is released before or after 1970."""
    if rocksfile.loc[row, 'Release_Year'] < 1970:
        print str(rocksfile.loc[row,'Song_Clean']) + " by " + 
str(rocksfile.loc[row,'Artist_Clean']) \
            + " was released before 1970."
    else:
        print str(rocksfile.loc[row,'Song_Clean']) + " by " + str(rocksfile.loc[row,'Artist_Clean']) \
            + " was released after 1970."

Using the .apply() function, apply the function you wrote to the first four rows of the DataFrame. You will need to tell the apply function to operate row by row. Setting the keyword argument as axis=1 indicates that the function should be applied to each row individually.

Using .apply:

rocksfile.apply(release_info, axis = 1, row=1)

Error Message:

TypeError                                 Traceback (most recent call last)
<ipython-input-61-fe0405b4d1e8> in <module>()
  1 #a = [1]
  2 
----> 3 rocksfile.apply(release_info, axis = 1, row=1)


TypeError: ("release_info() got multiple values for keyword argument 'row'", u'occurred at index 0')

release_info(1)

0

3 Answers 3

2

In pandas working with arrays (Series, DataFrames) so better is used vectorized pandas or numpy function, here the best is use numpy.where:

#condition
m = rocksfile['Release_Year'] < 1970
#concatenate columns together
a = rocksfile['Song_Clean'] + " by " + rocksfile['Artist_Clean']
#add different string to end
b =  a + " was released before 1970."
c =  a + " was released after 1970."

rocksfile['new'] = np.where(m, a, b)
print (rocksfile)
Sign up to request clarification or add additional context in comments.

Comments

1

Here:

rocksfile.apply(release_info, axis = 1, row=1)

row is not part of DataFrame.apply() expected arguments, so it get passed as a keyword arg to release_info(), in addition of the first positional argument, so release_info() ends up being called like this:

release_info(row_index, row=1)

Comments

0

You can use np.where and reduce this to 1 line.

s = rocksfile['Song_Clean'] 
    + ' was released by ' 
    + rocksfile['Artist_Clean'] 
    + pd.Series(np.where(rocksfile['Release_Year'] < 1970, 'before', 'after'))
    + ' 1970'

rocksfile['new'] = s

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.