2

I'm looking for an efficient way to repeatedly update rows in a DataFrame. By this I mean changing row values and its index label. I specifically need help with the latter. The best I could find is How to change Pandas dataframe index value? However, this updates the entire index, while I care about a single index label.

Directly assigning to index[n] is not supported:

>>> df.index[1] = 'new_label'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1374, in __setitem__
    raise TypeError("Index does not support mutable operations")

Would it work to modify (assign to) the numpy array underlying the index?

>>> df.index._values[1] = 'new_label'

The updates don't break index sortedness.


More context:

I have a DataFrame indexed by timestamps (DatetimeIndex) where I need to efficiently append new rows in real time (many times per second). I preallocate a large fixed-size DataFrame with NaT/NaN s and I append rows by writing into the next empty row.

2 Answers 2

1

updating a dataframe multiple time per second is not very efficient. you should append a list and then transform the list to a dataframe afterward.

such has:

log=list()
for i in range(1,10):   # here is your loop every minutes where log is updated
    log.append([date, value1, value2])

df = pd.DataFrame(log) # now you create the dataframe after being done updating the list.
Sign up to request clarification or add additional context in comments.

Comments

1

As Steven G pointed, this update is very unefficient.

Better is create list of DataFrames and then use concat:

df1 = pd.DataFrame({'a': [1,2,3]}, index=pd.date_range('2015-01-01', periods=3))
print (df1)
            a
2015-01-01  1
2015-01-02  2
2015-01-03  3

df2 = pd.DataFrame({'a': [7,8,9]}, index=pd.date_range('2016-01-01', periods=3))
print (df2)
            a
2016-01-01  7
2016-01-02  8
2016-01-03  9

dfs = [df1,df2]

df = pd.concat(dfs)
print (df)
            a
2015-01-01  1
2015-01-02  2
2015-01-03  3
2016-01-01  7
2016-01-02  8
2016-01-03  9

Then you can concat another DataFrame:

df3 = pd.DataFrame({'a': [3,2,5]}, index=pd.date_range('2017-01-01', periods=3))
print (df3)
            a
2017-01-01  3
2017-01-02  2
2017-01-03  5

df = pd.concat([df, df3])
print (df)
            a
2015-01-01  1
2015-01-02  2
2015-01-03  3
2016-01-01  7
2016-01-02  8
2016-01-03  9
2017-01-01  3
2017-01-02  2
2017-01-03  5

Or use DataFrame.append:

df4 = pd.DataFrame({'a': [3,2,4]}, index=pd.date_range('2018-01-01', periods=3))
print (df4)
            a
2018-01-01  3
2018-01-02  2
2018-01-03  4

df = df.append(df4)
print (df)
            a
2015-01-01  1
2015-01-02  2
2015-01-03  3
2016-01-01  7
2016-01-02  8
2016-01-03  9
2017-01-01  3
2017-01-02  2
2017-01-03  5
2018-01-01  3
2018-01-02  2
2018-01-03  4

Solution for replacing value in index:

You can use very similar as df.index._values[1] = 'new_label' only remove _ and another solution is with Index.set_value:

df = pd.DataFrame({'a': [1,2,3]}, index=pd.date_range('2016-01-01', periods=3))
print (df)
            a
2016-01-01  1
2016-01-02  2
2016-01-03  3

df.index.values[0] = pd.Timestamp(2016,11,23,1,0,0)
df.index.set_value(df.index, df.index[1], pd.Timestamp(2016,11,22,1,0,0))

print (df)
                     a
2016-11-23 01:00:00  1
2016-11-22 01:00:00  2
2016-01-03 00:00:00  3

3 Comments

I think OP wants to append new rows to the dataframe
If yes, then your answer is better.
I add better solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.