0

I am encountering pretty strange behavior. If I let

dict = {'newcol':[1,5], 'othercol':[12,-10]}
df = pandas.DataFrame(data=dict)
print df['newcol']

I get back a pandas Series object with 1 and 5 in it. Great.

print df

I get back the DataFrame as I would expect. Cool.

But what if I want to add to a DataFrame a little at a time? (My use case is saving metrics for machine learner training runs happening in parallel, where each process gets a number and then adds to only that row of the DataFrame.)

I can do the following:

df = pandas.DataFrame()
df['newcol'] = pandas.Series()
df['othercol'] = pandas.Series()
df['newcol'].loc[0] = 1
df['newcol'].loc[1] = 5
df['othercol'].loc[0] = 12
df['othercol'].loc[1] = -10
print df['newcol']

I get back the pandas Series I would expect, identical to creating the DataFrame by the first method.

print df

I see printed that df is an Empty DataFrame with columns [newcol, othercol].

Clearly in the second method the DataFrame's contents are equivalent to the first method. So why is it not smart enough to know it is filled? Is there a function I can call to update the DataFrame's knowledge of its own Series so all these (possibly out-of-order) Series can be unified in to a consistent DataFrame?

2
  • I think when you saying df['newcol'].loc[0] = 1 your actually editing the series and not the dataframe. Use df.loc[0, 'col'] like in @Vaishali's answer Commented Jan 17, 2018 at 20:23
  • Yes, but the DataFrame points to that Series, so it's odd it doesn't reflect the changes. @Vaishali's method updates both because you're telling the DataFrame to update itself, and then it manages updating the Series also. All this evidently means that when you ask a DataFrame about itself, it doesn't gather data from its constituent Series at query-time. Commented Jan 17, 2018 at 20:40

1 Answer 1

3

You would be able to assign data to an empty dataframe using following

df = pd.DataFrame()
df['newcol'] = pd.Series()
df['othercol'] = pd.Series()
df.loc[0, 'newcol'] = 1
df.loc[1, 'newcol'] = 5
df.loc[0, 'othercol'] = 12
df.loc[1, 'othercol'] = -10

    newcol  othercol
0   1.0     12.0
1   5.0     -10.0
Sign up to request clarification or add additional context in comments.

1 Comment

An additionally useful detail for all you noobs out there: It isn't necessary to set df['column'] = a pandas Series here. Exclude those lines, and everything still works just fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.