4

I'm trying to use pandas to create a ledger of activity. My object will have a pandas DataFrame that will track balances and transactions associated to that object.

I'm struggling how to append single rows of data to that pandas dataframe as orders get associated to that object. It seems like the most common answer is to "only create the frame once you have all the data", however I can't do that. I want to have the ability to compute on-the-fly as I'm adding in new data.

Here's my associated code (which fails):

self.ledger  = pd.DataFrame(data={'entry_date' : [pd.Timestamp('1900-01-01')],
'qty' : [np.float64(startingBalance)],
'element_type' : [pd.Categorical(["startingBalance"])],
'avail_bal' : [np.float64(startingBalance)],
'firm_ind' : True,
'deleted_ind' : False,
'ord_id' : ["fooA"],
'parent_ord_id' : ["fooB"] },
columns=ledgerColumnList
)        

self.ledger.iloc[-1] = dict({'entry_date' : ['1900-01-02'],
'qty' : [startingBalance],
'element_type' : ["startingBalance"],
'avail_bal' : [startingBalance],
'firm_ind' : [True],
'deleted_ind' : [False],
'ord_id' : ["foofa"],
'parent_ord_id' : ["foofb"] })

Here's the error I'm getting:

File "C:\Users\MyUser\My Documents\Workspace\myscript.py", line 135, in __init__
'parent_ord_id' : ["foofb"] })
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 117, in __setitem__
self._setitem_with_indexer(indexer, value)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 492, in _setitem_with_indexer
setter(item, v)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 422, in setter
s._data = s._data.setitem(indexer=pi, value=v)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2843, in setitem
return self.apply('setitem', **kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2823, in apply
applied = getattr(b, f)(**kwargs)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 636, in setitem
values, _, value, _ = self._try_coerce_args(self.values, value)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2066, in _try_coerce_args
raise TypeError
TypeError

Thoughts?

1) How can I do this in Pandas?

or

2) Is there something better I should be using that would give me the built-in calculation tools of pandas but would be more well-suited to my little-at-a-time data needs?

3 Answers 3

3

You can also use df.loc[]

df = pd.DataFrame({'A': [1,2,3,4], 'B': [5,6,7,8], 'C': [9,10,11,12]})
df
    A   B   C
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12
new_row = pd.DataFrame({'A': [35], 'B': [27], 'C': [43]})
new_row
     A  B   C
0   35  27  43
df.loc[4] = new_row.loc[0]
df
    A   B   C
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12
4   35  27  43
Sign up to request clarification or add additional context in comments.

Comments

2

You can also try to create a new dataframe for the new data, and then use concat.

For illustration purposes, let's take a simple dataframe:

import pandas as pd
df = pd.DataFrame({'a':[0,1,2],'b':[3,4,5]}
print df
>>    a  b
   0  0  3
   1  1  4
   2  2  5

Let's say you have new data coming in, with values a=4 and b=7. Create a new dataframe containing only the new data:

newresults = {'a':[4],'b':[7]}
_dfadd = pd.DataFrame(newresults)
print _dfadd
>>    a  b
   0  4  7

Then concatenate:

df = pd.concat([df,_dfadd]).reset_index(drop=True)
print df
>>    a  b
   0  0  3
   1  1  4
   2  2  5
   3  4  7       

Comments

1

One way is to use pandas.DataFrame.append():

self.ledger = pd.DataFrame(data={'entry_date' : [pd.Timestamp('1900-01-01')],
                                  'qty' : [np.float64(startingBalance)],
                                  'element_type' : [pd.Categorical(["startingBalance"])],
                                  'avail_bal' : [np.float64(startingBalance)],
                                  'firm_ind' : [True],
                                  'deleted_ind' : [False],
                                  'ord_id' : ["fooA"],
                                  'parent_ord_id' : ["fooB"] },
                            columns=ledgerColumnList)

df = pd.DataFrame(data={'entry_date' : [pd.Timestamp('1900-01-02')],
                        'qty' : [np.float64(startingBalance)],
                        'element_type' : ["startingBalance"],
                        'avail_bal' : [np.float64(startingBalance)],
                        'firm_ind' : [True],
                        'deleted_ind' : [False],
                        'ord_id' : ["foofa"],
                        'parent_ord_id' : ["foofb"] },
                  columns=ledgerColumnList)

self.ledger.append(df)

1 Comment

Almost everything I've read suggests against using append very often if at all possible because it creates a new object with the new data appended, so there's quite a bit of overhead involved and it is much slower than traditional appending operations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.