0

Currently working on a dataset using pandas. Don't have much experience with this sort of stuff so any help would be greatly appreciated. dataset (shown below):

sample dataset

The table shows ratings associated with different segments grouped by year. I am attempting to parse the table and pull the most recent rating from its associated year column (excluding nans), and apply it to its respective place in the Curr_Rate column along with the year the rating was collected in the Curr_RatingYr.

The second task is to pull the second most recent rating (with respective year) and populate these values into the Prev_Rate and PrevRatingYr fields. Finally I need to generate averages from all the ratings available 2000-2017. I have the average part down, but when I try and parse the table to generate values for Current Rating and Previous Rating I am met with:

TypeError stating numpy.float64 object is not callable at index 0

Any help would be greatly appreciated.

df = pd.read_excel('CurrPrevRate1.xlsx')

df.head()

dftest = df[:100]

    # Replace zeros with NaN
    dftest[['y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']] = dftest[['y2000','y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']].replace(0, np.nan)

    #Change all values in these columns to floats
    #dftest[['y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']] = dftest[['y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']].apply(pd.to_numeric)

    #Get average of rows 
    dftest['AvgRating'] = dftest[['y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']].mean(axis=1)

    def getCurrRate():
        for x in dftest['y2017']:
            if 0 <= x <= 10:
                return x
            else:
                for y in dftest['y2016']:
                    if 0 <= y <= 10:
                        return y
                    else:
                        for z in dftest['y2015']:
                            if 0 <= z <= 10:
                                return z
                            else:
                                return 'N/A'

    dftest['Curr_Rate'] = dftest[['y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006','y2007', 'y2008', 'y2009', 'y2010', 'y2011', 'y2012', 'y2013', 'y2014', 'y2015', 'y2016', 'y2017']].apply(getCurrRate(), axis=1)

    dftest
1
  • Can you provide (a) actual, inline data instead of a screenshot, and (b) expected input and output? More generally, you'll find you get better help, faster, when you post a Minimum, Complete, and Verifiable Example. Commented Aug 25, 2017 at 16:40

1 Answer 1

1

The error seems related to your apply() syntax.

  1. Call apply() with a function name, no () on the end. E.g. apply(getCurrRate, axis=1).
  2. The function you apply your data to usually takes an argument, e.g. getCurrRate(yr). Here, yr is the object passed implicitly from apply(), so with axis=1 you'd be executing:

    getCurrRate(dftest.y2000)
    getCurrRate(dftest.y2001)
    #...
    getCurrRate(dftest.y2017)
    

    But without a parameter in your getCurrRate definition, apply() doesn't have anything to apply on.

At least for the case of currRate, it seems like you really just want the most recent, non-NaN value from the y<year> columns. In that case, consider a simpler approach:

def getCurrRate(yr):
    return yr.dropna()[-1]

ratings_cols = df.columns[df.columns.str.startswith('y')]
df['currRate'] = df[ratings_cols].apply(getCurrRate, axis=1)

Here's some toy data to demonstrate:

data = {'segmentId':['foo','bar','baz'],
        'y2015':[5, 6, 7],
        'y2016':[2, np.nan, 4],
        'y2017':[np.nan, np.nan, 9]}
df = pd.DataFrame(data)

df
  segmentId  y2015  y2016  y2017
0       foo      5    2.0    NaN
1       bar      6    NaN    NaN
2       baz      7    4.0    9.0

We'd expect the following values for currRate:

  • index 0: 2
  • index 1: 6
  • index 2: 9

And that's what we get with the new getCurrRate:

df['currRate'] = df[ratings_cols].apply(getCurrRate, axis=1)

df
  segmentId  y2015  y2016  y2017  currRate
0       foo      5    2.0    NaN       2.0
1       bar      6    NaN    NaN       6.0
2       baz      7    4.0    9.0       9.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.