Using A For Loop to Return Unique Values in a Pandas Dataframe

Question

I know Pandas isn't really built to use with for-loops, but I have a specific task I'll have to do many times and it'd really save a lot of time if I could abstract some of it away with a function that I can call.

A generic version of my dataframe looks like this:

df = pd.DataFrame({'Name': pd.Categorical(['John Doe', 'Jane Doe', 'Bob Smith']), 'Score1': np.arange(3), 'Score2': np.arange(3, 6, 1)})

        Name  Score1  Score2
0   John Doe       0       3
1   Jane Doe       1       4
2  Bob Smith       2       5

What I want to do is take the method:

df.loc[df.Name == 'Jane Doe', 'Score2']

Which should return 4, but iterate through it with a for-loop like so:

def pull_score(people, score):    
    for i in people:
        print df.loc[df.Name == people[i], score]

So if I wanted to I could call:

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(the_names, 'Score2')

And get:

3
4
5

The error message I currently get is:

TypeError: list indices must be integers, not str

I've looked at some of the other answers relating to this error message and Pandas such as this one: Python and JSON - TypeError list indices must be integers not str and this one: How to solve TypeError: list indices must be integers, not list?

But didn't see the answer in either of them for what I'm trying to do and I don't believe iterrows() or itertuple() would apply since I need Pandas to find the values first.

akuiper · Accepted Answer · 2016-08-20 23:51:54Z

3

You can set the name as index and then search by index using loc:

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
df.set_index('Name').loc[the_names, 'Score2']

# Name
# John Doe     3
# Jane Doe     4
# Bob Smith    5
# Name: Score2, dtype: int32

answered Aug 20, 2016 at 23:51

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2016-08-21 07:25:18Z

First things first. You have an error in your logic in that when you establish your for loop, you use the things in people as if they are indices for the list people when they are the things in people. So instead, do

def pull_score(df, people, score):
    for i in people:
        print df.loc[df.Name == i, score]

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(df, the_names, 'Score2')

0    3
Name: Score2, dtype: int64
1    4
Name: Score2, dtype: int64
2    5
Name: Score2, dtype: int64

Now that that has been said, I'll jump on the same band-wagon the other answerers are on in stating that there are better ways of doing this using built in pandas functionality. Below are my attempts at capturing what each of the solutions are trying to do in a function named after the user providing the solution. I'll propose that pir is the most efficient as it is using functionality designed to do exactly this task.

def john(df, people, score):
    s = pd.Series([])
    for i in people:
        s = s.append(df.loc[df['Name'] == i, score])
    return s

def psidom(df, people, score):
    return df.set_index('Name').loc[people, score]

def pir(df, people, score):
    return df.loc[df['Name'].isin(people), score]

Timing

Joe T. Boka · Accepted Answer · 2016-08-20 23:54:04Z

2

You actually don't need the loop, you can just do this:

print(df.loc[df.Name == the_names, 'Score2'])
0    3
1    4
2    5
Name: Score2, dtype: int32

answered Aug 20, 2016 at 23:54

Joe T. Boka

6,5896 gold badges33 silver badges49 bronze badges

1 Comment

piRSquared Over a year ago

This is inaccurate. It only coincidentally works for the stated test case. Try df.loc[df.Name == the_names[:2], 'Score2'] and it fails!

Collectives™ on Stack Overflow

Using A For Loop to Return Unique Values in a Pandas Dataframe

3 Answers 3

Comments

Timing

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Timing

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related