1

I have a DataFrame and I would like to add some inexisting rows to it. I have found the .loc method, but this adds the values at the end, and not in a sorted way. For example

import numpy as np
import pandas as pd

dfi = pd.DataFrame(np.arange(6).reshape(3,2),columns=['A','B'])

>>> dfi
    A B
0   0 1
1   2 3
2   4 5
[3 rows x 2 columns]

Adding a inexisting row through .loc:

dfi.loc[5,:] = 0
>>> dfi
    A B
0   0 1
1   2 3
2   4 5
5   0 0
[3 rows x 2 columns]

So far everything ok. But this is what happens when trying to add another row, with index smaller than the last one:

dfi.loc[3,:] = 0
>>> dfi
    A B
0   0 1
1   2 3
2   4 5
5   0 0
3   0 0
[3 rows x 2 columns]

I would like it to put the row with index 3 between the row 2 and the 5. I could sort the DataFrame by index everytime, but that would take too long. Is there another way?

My actual problem is considering a DataFrame where the indexes are datetime objects. I didn't put the whole detail of that implementation here because that would confuse what my real problem is: adding rows in DataFrame such that the result has an ordered index.

3
  • 1
    Don't know of a way to do what you're asking. In general adding rows one by one to a dataframe is very low-performance. Could you build a temporary data structure piece-by piece, then make it a DataFrame, and then concatenate the two and sort it once? Commented Jun 16, 2014 at 13:59
  • why are you trying to assign with string? e.g. "3", rather than just 3. your index is a Int64Index; this is a very odd thing to do. Commented Jun 16, 2014 at 15:25
  • @Jeff you are right. I copied an example from Pandas doc which actually used strings, and I thought it was the general rule. Editing now... Commented Jun 17, 2014 at 16:03

1 Answer 1

1

If your index is almost continuous, only missing a few values here and there. I think you may try the following,

In [15]:

df=pd.DataFrame(np.zeros((100,2)), columns=['A', 'B'])
df['A']=np.nan
df['B']=np.nan
In [16]:

df.iloc[[0,1,2]]=pd.DataFrame({'A': [0,2,4,], 'B': [1,3,5]})
df.iloc[5]=[0,0]
df.iloc[3]=0
print df.dropna()
   A  B
0  0  1
1  2  3
2  4  5
3  0  0
5  0  0

[5 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

It works! I had read about .iloc but didn't really understood how it worked, and prefered not to use it... my bad. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.