Adding row in Pandas DataFrame keeping index order

Question

I have a DataFrame and I would like to add some inexisting rows to it. I have found the .loc method, but this adds the values at the end, and not in a sorted way. For example

import numpy as np
import pandas as pd

dfi = pd.DataFrame(np.arange(6).reshape(3,2),columns=['A','B'])

>>> dfi
    A B
0   0 1
1   2 3
2   4 5
[3 rows x 2 columns]

Adding a inexisting row through .loc:

dfi.loc[5,:] = 0
>>> dfi
    A B
0   0 1
1   2 3
2   4 5
5   0 0
[3 rows x 2 columns]

So far everything ok. But this is what happens when trying to add another row, with index smaller than the last one:

dfi.loc[3,:] = 0
>>> dfi
    A B
0   0 1
1   2 3
2   4 5
5   0 0
3   0 0
[3 rows x 2 columns]

I would like it to put the row with index 3 between the row 2 and the 5. I could sort the DataFrame by index everytime, but that would take too long. Is there another way?

My actual problem is considering a DataFrame where the indexes are datetime objects. I didn't put the whole detail of that implementation here because that would confuse what my real problem is: adding rows in DataFrame such that the result has an ordered index.

Don't know of a way to do what you're asking. In general adding rows one by one to a dataframe is very low-performance. Could you build a temporary data structure piece-by piece, then make it a DataFrame, and then concatenate the two and sort it once? — exp1orer
– exp1orer, Commented Jun 16, 2014 at 13:59
why are you trying to assign with string? e.g. "3", rather than just 3. your index is a Int64Index; this is a very odd thing to do. — Jeff
– Jeff, Commented Jun 16, 2014 at 15:25
@Jeff you are right. I copied an example from Pandas doc which actually used strings, and I thought it was the general rule. Editing now... — tomasyany
– tomasyany, Commented Jun 17, 2014 at 16:03

CT Zhu · Accepted Answer · 2014-06-16 15:22:11Z

1

If your index is almost continuous, only missing a few values here and there. I think you may try the following,

In [15]:

df=pd.DataFrame(np.zeros((100,2)), columns=['A', 'B'])
df['A']=np.nan
df['B']=np.nan
In [16]:

df.iloc[[0,1,2]]=pd.DataFrame({'A': [0,2,4,], 'B': [1,3,5]})
df.iloc[5]=[0,0]
df.iloc[3]=0
print df.dropna()
   A  B
0  0  1
1  2  3
2  4  5
3  0  0
5  0  0

[5 rows x 2 columns]

answered Jun 16, 2014 at 15:22

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tomasyany Over a year ago

It works! I had read about .iloc but didn't really understood how it worked, and prefered not to use it... my bad. Thanks!

Collectives™ on Stack Overflow

Adding row in Pandas DataFrame keeping index order

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related