How to solve the Error - name is not defined in a function in python

Question

Let's assume that I have a dataset consisting the columns, movieId, title, year and ratings. Below is a sample subset of my data:

| movieId |   title  | rating | year |
|:-------:|:--------:|:------:|:----:|
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 0.5    | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 5      | 1995 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 5      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 1      | 2000 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 4.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 4      | 1997 |
| 3       | pqr      | 1.5    | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2.5    | 1997 |
| 4       | def      | 3      | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 0.5    | 1999 |
| 4       | def      | 2      | 1999 |
| 4       | def      | 3      | 1999 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 0.5    | 2006 |
| 5       | movie123 | 1      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2      | 2012 |
| 6       | movie456 | 4      | 2012 |

I want to calculate the average, count, minimum rating, and mean ratings across the whole dataset by defining a function. So, I first calculate the number and average of ratings for each movie.

avg_rating = df.groupby(['movieId','year'])['ratings'].agg([('Count','size'), ('Mean','mean')]).sort_values(by='Mean',ascending=False)

Since, some movies may have less number of reviews but a higher rating Vs other movies with higher number of reviews and higher rating, the analysis might get skewed. So, I want to calculate weighted average, for which I have defined a function.

# R = average for the movie (mean) = (Rating)
# v = number of ratings/reviews for the movie = (votes)
# m = minimum reviews required to be listed in the Top 250 movie list
# C = the mean rating across the whole report

def weighted_rating(R, v, m, C): 
  return (v/(v+m))*R + (m/(v+m))*C

avg_rating= avg_rating.assign(wr = weighted_rating(mean, count, 500, mean(mean)))

When I run the above last line of code, I get an error: name 'mean' is not defined

The same error persists for column count: name 'count' is not defined

How can I solve this error so that my final output contains the columns movieId, year, count, mean, and wr?

imjoseangel · Accepted Answer · 2021-11-07 18:18:35Z

You need to use the fields from your df:

avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

So the final code could be like:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import (division, absolute_import, print_function,
                        unicode_literals)

import pandas as pd
from statistics import mean


def weighted_rating(R, v, m, C):
    return (v / (v + m)) * R + (m / (v + m)) * C


def main():

    df = pd.read_csv('movies.csv')

    avg_rating = df.groupby(['movieId', 'year'])['rating'].agg(
        [('Count', 'size'), ('Mean', 'mean')]).sort_values(by='Mean', ascending=False)

    avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

    print(avg_rating)


if __name__ == '__main__':
    main()

Using the CSV:

movieId,title,rating,year
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3,1995
1,abc,5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,0.5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,5,1995
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,4,2000
2,xyz,3.5,2000
2,xyz,5,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,4,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,1,2000
3,pqr,3,1997
3,pqr,2,1997
3,pqr,3.5,1997
3,pqr,3.5,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,4.5,1997
3,pqr,3.5,1997
3,pqr,4,1997
3,pqr,1.5,1997
3,pqr,2,1997
3,pqr,2,1997
3,pqr,2.5,1997
4,def,3,1999
4,def,2.5,1999
4,def,2.5,1999
4,def,0.5,1999
4,def,2,1999
4,def,3,1999
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,3,2006
5,movie12,2,2006
5,movie12,2.5,2006
5,movie12,3,2006
5,movie12,4,2006
5,movie12,0.5,2006
5,movie12,1,2006
5,movie12,3.5,2006
5,movie12,2,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,2.5,2006
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3.5,2006
5,movie12,3,2006
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,3.5,2012
6,movie45,4,2012
6,movie45,5,2012
6,movie45,2.5,2012
6,movie45,4,2012
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,5,2012
6,movie45,2,2012
6,movie45,4,2012

Thanks for the quick reply! If I need to save this output in a data frame instead of calling as a function, how can I do that? I tried assigning it to a data frame when defining a function but that gives me an error

Collectives™ on Stack Overflow

How to solve the Error - name is not defined in a function in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related