107

Here is a simple example of the code I am running, and I would like the results put into a pandas dataframe (unless there is a better option):

for p in game.players.passing():
    print p, p.team, p.passing_att, p.passer_rating()

R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8

Using this code:

d = []
for p in game.players.passing():
    d = [{'Player': p, 'Team': p.team, 'Passer Rating':
        p.passer_rating()}]

pd.DataFrame(d)

I can get:

    Passer Rating   Player      Team
  0 55.8            A.Rodgers   GB

Which is a 1x3 dataframe, and I understand why it is only one row but I can't figure out how to make it multi-row with the columns in the correct order. Ideally the solution would be able to deal with n number of rows (based on p) and it would be wonderful (although not essential) if the number of columns would be set by the number of stats requested. Any suggestions? Thanks in advance!

6
  • You're overwriting your list with each iteration, not appending Commented Jan 20, 2015 at 23:03
  • Right, I understand what is wrong with it, the problem is I can't figure out how to make it work correctly. This is just the closest I could get. Commented Jan 21, 2015 at 1:24
  • 2
    The answer below will work. You could also just do d.append({'Player': ...}) in your loop. Python docs on lists is pretty good. Commented Jan 21, 2015 at 1:26
  • 1
    You should also clarify your question to state the real issue: that you're having trouble appending to an empty list. (you seem to understand how to create dataframes from lists of dictionaries very well) Commented Jan 21, 2015 at 1:30
  • 1
    While I think I understand what you are saying, I believe the question I asked is actually what I would prefer, while the code I posted was the closest I could get before asking for help. Commented Jan 21, 2015 at 1:55

4 Answers 4

149

The simplest answer is what Paul H said:

d = []
for p in game.players.passing():
    d.append(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating':  p.passer_rating()
        }
    )

pd.DataFrame(d)

But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.

d = pd.DataFrame()

for p in game.players.passing():
    temp = pd.DataFrame(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating': p.passer_rating()
        }
    )

    d = pd.concat([d, temp])
Sign up to request clarification or add additional context in comments.

5 Comments

is it preferable to append a dict to the list and create the df only at the end due to superior performance, or just better readability?
Performance. To quote the docs: ...concat (and therefore append) makes a full copy of the data, and ... constantly reusing this function can create a signifcant performance hit.
@NickMarinakis: I don't understand your comment: if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend). Then how else can you build the dataframe if not via a loop?
@stackoverflowuser2010: So my comment means that you shouldn't create a dataframe and then loop over your data to fill it. Every time you use pd.concat you're making a full copy of the data. It's wildly inefficient. Instead, just create a different data structure (e.g. a list of dicts) and then convert that to a dataframe all at once.
@NickMarinakis: Ok. In the first part of your answer you're still using a loop (to build up a list of dict one row at a time) and then converting the whole thing at once to a DataFrame. In the second (worse) solution, you're appending via (concat) one DataFrame row at a time. Understood.
51

Try this using list comprehension:

import pandas as pd

df = pd.DataFrame(
    [p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)

9 Comments

Out of the box this gets me the closest to what I was looking for with the columns in the correct order, but I don't know enough about either python or pandas to say if it is the best answer. Thanks for the help everyone.
What is df here?
@Cai Pandas dataframe
@Amit As in df = pandas.DataFrame()? Or as in from pandas import DataFrame as df?
@Amit Ok, then in that case should the solution be d = df([p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing())? (I.e. so df is called rather than indexed?)
|
40

Make a list of tuples with your data and then create a DataFrame with it:

d = []
for p in game.players.passing():
    d.append((p, p.team, p.passer_rating()))

pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))

A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.

Testing functions:

def with_tuples(loop_size=1e5):
    res = []

    for x in range(int(loop_size)):
        res.append((x-1, x, x+1))

    return pd.DataFrame(res, columns=("a", "b", "c"))

def with_dict(loop_size=1e5):
    res = []

    for x in range(int(loop_size)):
        res.append({"a":x-1, "b":x, "c":x+1})

    return pd.DataFrame(res)

Results:

%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop

%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop

4 Comments

I tried this in my code and it works amazing with the tuple. Just wondering that Tuple are immutable. So how are we able to append them ?
@SumitPokhrel Tuples are immutable, but they aren't being mutated by the append. The List is being appended to and is thus what is being mutated.
Don't you think appending something is mutating or changing it from it's original form ? If List is being mutated by Append then why Tuple isn't being mutated by Append ?
@SumitPokhrel because you append tuples to the list: res=[(1,2)] first, and then res.append((3,4)) gives [(1,2),(3,4)] So the tuples are not mutated
1

I may be wrong, but I think the accepted answer by @amit has a bug.

from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]

# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]

# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)

# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.