pandas assignment without loop?

Question

I have:

NewData, a pd.DataFrame to be populated from
SourceData, a list of dataframes holding source data and
source, a dataframe holding index values for which dataframe in SourceData is to be assigned.
indexlen, an integer for the length of indexes in the dataframes

(Using dataframes because it's critical the indexes align.)

For instance, assume that there are 1000 df's in SourceData, and indexlen is 10,000. At 10,000, I will be assigning all columns from SourceData to NewData, moving up the indexes (es because all df's share the same index) until source decrements, at which point I will start assigning the values from all columns in the dataframe in SourceData[999] to NewData, etc.

I'm currently doing this with a loop:

for j in range(1, indexlen + 1):
    NewData[j] = SourceData[source[j]].ix[j,:]

I would like to do this without using a loop, but I don't know how to broadcast this. I'm sure I'm missing something obvious, but any help would be appreciate. Thank you!

Edit: I made source a list, because I figured that was more efficient to access by element.

In response to a question about the dataframes, they are standard price data:

>>>SourceData[1].head()

bpz1975     Open    High    Low     Close   Vol     OI
1975-02-13  2.275   2.275   2.275   2.275   0   50
1975-02-14  2.275   2.275   2.275   2.275   0   50
1975-02-18  2.275   2.275   2.275   2.275   0   50
1975-02-19  2.290   2.290   2.290   2.290   0   50
1975-02-20  2.290   2.290   2.290   2.290   0   50

In this case, reading in all months of a futures contract and then applying roll logic to create a series.

Do you have some samples for what your data frames look like? — TomAugspurger
– TomAugspurger, Commented Feb 14, 2014 at 1:45
edited question with a head() of one of the dfs. also, the indexes can well be >10,000 so memory may be an issue too if I don't do this efficiently. (As I think you can tell, my question is as much about good programming practice as this specific question, so any criticisms are welcome. Thanks!) — user3241893
– user3241893, Commented Feb 14, 2014 at 1:55
and i also just tried it with NewData as a list. Much, much faster. That solution is acceptable I think if there's no better way to do it. — user3241893
– user3241893, Commented Feb 14, 2014 at 2:03

Andy Hayden · Accepted Answer · 2014-02-14 07:05:48Z

1

Creating the DataFrame, and filling it in is not usually the fastest or most pandastic way.

In this case it looks like you can do a concat:

pd.concat(SourceData)

If you need to include source, the index information, within the DataFrames in SourceData, then I would do this before doing the concat.

It's unclear exactly what this entails, but it sounds like your suggesting to set the index for each frame based on source... you can create a function which passes over SourceData changing the index of each DataFrame with that from source (without seeing source it's unclear what exactly how).

answered Feb 14, 2014 at 7:05

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas assignment without loop?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related