2

I know Pandas isn't really built to use with for-loops, but I have a specific task I'll have to do many times and it'd really save a lot of time if I could abstract some of it away with a function that I can call.

A generic version of my dataframe looks like this:

df = pd.DataFrame({'Name': pd.Categorical(['John Doe', 'Jane Doe', 'Bob Smith']), 'Score1': np.arange(3), 'Score2': np.arange(3, 6, 1)})

        Name  Score1  Score2
0   John Doe       0       3
1   Jane Doe       1       4
2  Bob Smith       2       5

What I want to do is take the method:

df.loc[df.Name == 'Jane Doe', 'Score2']

Which should return 4, but iterate through it with a for-loop like so:

def pull_score(people, score):    
    for i in people:
        print df.loc[df.Name == people[i], score]

So if I wanted to I could call:

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(the_names, 'Score2')

And get:

3
4
5

The error message I currently get is:

TypeError: list indices must be integers, not str

I've looked at some of the other answers relating to this error message and Pandas such as this one: Python and JSON - TypeError list indices must be integers not str and this one: How to solve TypeError: list indices must be integers, not list?

But didn't see the answer in either of them for what I'm trying to do and I don't believe iterrows() or itertuple() would apply since I need Pandas to find the values first.

3 Answers 3

3

You can set the name as index and then search by index using loc:

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
df.set_index('Name').loc[the_names, 'Score2']

# Name
# John Doe     3
# Jane Doe     4
# Bob Smith    5
# Name: Score2, dtype: int32
Sign up to request clarification or add additional context in comments.

Comments

2

First things first. You have an error in your logic in that when you establish your for loop, you use the things in people as if they are indices for the list people when they are the things in people. So instead, do

def pull_score(df, people, score):
    for i in people:
        print df.loc[df.Name == i, score]

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(df, the_names, 'Score2')

0    3
Name: Score2, dtype: int64
1    4
Name: Score2, dtype: int64
2    5
Name: Score2, dtype: int64

Now that that has been said, I'll jump on the same band-wagon the other answerers are on in stating that there are better ways of doing this using built in pandas functionality. Below are my attempts at capturing what each of the solutions are trying to do in a function named after the user providing the solution. I'll propose that pir is the most efficient as it is using functionality designed to do exactly this task.

def john(df, people, score):
    s = pd.Series([])
    for i in people:
        s = s.append(df.loc[df['Name'] == i, score])
    return s

def psidom(df, people, score):
    return df.set_index('Name').loc[people, score]

def pir(df, people, score):
    return df.loc[df['Name'].isin(people), score]

Timing

enter image description here

Comments

2

You actually don't need the loop, you can just do this:

print(df.loc[df.Name == the_names, 'Score2'])
0    3
1    4
2    5
Name: Score2, dtype: int32

1 Comment

This is inaccurate. It only coincidentally works for the stated test case. Try df.loc[df.Name == the_names[:2], 'Score2'] and it fails!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.