python- concatenate and append a pandas dataframe in a for loop

Question

I am sorry I didnt really know how to word the title of this question. I do not work with Python too often and I am just starting to work with the pandas and numpy packages.

I am getting unexpected results when trying to concatenate and append a pandas dataframe in a for loop.

I have a data set that I got from sql and put into a pandas dataframe (df):

print(df.head())

   date visitor  visitor_score home  home_score   W   L
0  20160405     BOS              6  CLE           2  94  67
1  20160406     BOS              6  CLE           7  94  67
2  20160408     BOS              8  TOR           7  89  73
3  20160409     BOS              8  TOR           4  89  73
4  20160410     BOS              0  TOR           3  89  73

I have another data set from sql that I also put in a pandas data frame (dfBostonStats):

print(dfBostonStats.head())

  teamID    ab     h   2b  3b   hr    so  sb   ra   er  era  IPouts    HA  \
0    BOS  5670  1598  343  25  208  1160  83  694  640  4.0    4319  1342

   hra   soa   e     fp  bpf  ppf   dp
0  176  1362  75  0.987  108  106  139

I want to concatenate that data frame (dfBostonStats) to each row of the first data frame (df).

I determined I could use pandas.concat and I proved this through concatenating the first row of df:

print(pd.concat([df.iloc[[0]], dfBostonStats], axis=1))

       date visitor  visitor_score home  home_score   W   L teamID    ab  \
0  20160405     BOS              6  CLE           2  94  67    BOS  5670

      h ...   era  IPouts    HA  hra   soa   e     fp  bpf  ppf   dp
0  1598 ...   4.0    4319  1342  176  1362  75  0.987  108  106  139

I then tried to concatenate each row by using a for loop but it gives me an unexpected result. it concatenates one row properly but then prints me a row of just the 2nd dataframe I have listed (dfBostonStats)

for index, element in df.iterrows():
   tempdf = pd.concat([df.iloc[[index]], dfBostonStats], axis=1)
   concatDataFrame = concatDataFrame.append(tempdf, ignore_index=True)


print(concatDataFrame.head())

       date visitor  visitor_score home  home_score     W     L teamID  \
0  20160405     BOS            6.0  CLE         2.0  94.0  67.0    BOS
1       NaN     NaN            NaN  NaN         NaN   NaN   NaN    BOS
2  20160406     BOS            6.0  CLE         7.0  94.0  67.0    NaN
3       NaN     NaN            NaN  NaN         NaN   NaN   NaN    BOS
4  20160408     BOS            8.0  TOR         7.0  89.0  73.0    NaN

       ab       h ...   era  IPouts      HA    hra     soa     e     fp  \
0  5670.0  1598.0 ...   4.0  4319.0  1342.0  176.0  1362.0  75.0  0.987
1  5670.0  1598.0 ...   4.0  4319.0  1342.0  176.0  1362.0  75.0  0.987
2     NaN     NaN ...   NaN     NaN     NaN    NaN     NaN   NaN    NaN
3  5670.0  1598.0 ...   4.0  4319.0  1342.0  176.0  1362.0  75.0  0.987
4     NaN     NaN ...   NaN     NaN     NaN    NaN     NaN   NaN    NaN

     bpf    ppf   dp
0  108.0  106.0  139
1  108.0  106.0  139
2    NaN    NaN  NaN
3  108.0  106.0  139
4    NaN    NaN  NaN

I can not figure out why it is printing that row with only dfBostonStats rather then just printing only concatenated rows?

On a side note, I know inside the for loop there is a copy occuring every time causing a performance hit but I figured I would deal with that once I get the data looking how it should.

Is necessary loop? Do you need add one column DataFrame dfBostonStats to df ? — jezrael
– jezrael, Commented Feb 9, 2018 at 12:38
It is necessary to loop because I have other teams I am going to do this for. I was planning on having a loop then I will have a switch statement and append a different data frame depending on the team — JoeG
– JoeG, Commented Feb 9, 2018 at 12:42
the other teams will also be stored in a dataframe, because they are sql queries. Each team will be seperate dataframes if i use the for loop with a switch inside it. On the other hand, they could just be one dataframe if I used a different approach. — JoeG
– JoeG, Commented Feb 9, 2018 at 14:06

jezrael · Accepted Answer · 2018-02-09 13:05:19Z

1

I think if need join first dataframe by column visitor and second by column teamID use merge with left join. No loop is necessary:

print (df)
       date visitor  visitor_score home  home_score   W   L
0  20160405     BOS              6  CLE           2  94  67
1  20160406     BOS              6  CLE           7  94  67
2  20160408     AAA              8  TOR           7  89  73
3  20160409     AAA              8  TOR           4  89  73
4  20160410     AAA              0  TOR           3  89  73

print (dfBostonStats)
  teamID    ab     h   2b  3b   hr    so  sb   ra   er  era  IPouts    HA  \
0    BOS  5670  1598  343  25  208  1160  83  694  640  4.0    4319  1342   
0    AAA     4     5    6   4    5  1160  83  694  640  4.0    4319  1342   

   hra   soa   e     fp  bpf  ppf   dp  
0  176  1362  75  0.987   10  106  139  
0  176  1362  75  0.987   10  106  139  

df2 = df.merge(dfBostonStats, left_on='visitor', right_on='teamID', how='left')
print (df2)
       date visitor  visitor_score home  home_score   W   L teamID    ab  \
0  20160405     BOS              6  CLE           2  94  67    BOS  5670   
1  20160406     BOS              6  CLE           7  94  67    BOS  5670   
2  20160408     AAA              8  TOR           7  89  73    AAA     4   
3  20160409     AAA              8  TOR           4  89  73    AAA     4   
4  20160410     AAA              0  TOR           3  89  73    AAA     4   

      h ...   era  IPouts    HA  hra   soa   e     fp  bpf  ppf   dp  
0  1598 ...   4.0    4319  1342  176  1362  75  0.987   10  106  139  
1  1598 ...   4.0    4319  1342  176  1362  75  0.987   10  106  139  
2     5 ...   4.0    4319  1342  176  1362  75  0.987   10  106  139  
3     5 ...   4.0    4319  1342  176  1362  75  0.987   10  106  139  
4     5 ...   4.0    4319  1342  176  1362  75  0.987   10  106  139  

[5 rows x 27 columns]

edited Feb 9, 2018 at 13:05

answered Feb 9, 2018 at 12:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

JoeG Over a year ago

using your first method I see how I would assign it to the dataframe as a whole but how would i assign it to one element at a time in the for loop? This will obviously blow up: for index, element in df.iterrows(): df = element.assign(**dfBostonStats.iloc[0]) I dont know how else to look at it (maybe i have been staring at it for too long!)

jezrael Over a year ago

I think you need second solution, if need join first dataframe by column visitor and second by column teamID. No loop is necessary.

JoeG Over a year ago

I am at work now and away from my PC, when I get home I will give it a shot

jezrael Over a year ago

Sure, no problem.

JoeG Over a year ago

The merge solution worked! Thank you for your help sir

Collectives™ on Stack Overflow

python- concatenate and append a pandas dataframe in a for loop

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related