3

I have a dataframe, and I want to create a new column and add arrays to this each row of this new column. I know to do this I have to change the datatype of the column to 'object' I tried the following but it doesn;t work,

import pandas
import numpy as np

df = pandas.DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df['b'] = df['b'].astype(object)
df.loc[0,'b'] = [[1,2,4,5]]

The error is

ValueError: Must have equal len keys and value when setting with an ndarray

However, it works if I convert the datatype of the whole dataframe into 'object':

df = pandas.DataFrame({'a':[1,2,3,4]})
df['b'] = np.nan
df = df.astype(object)
df.loc[0,'b'] = [[1,2,4,5]] 

So my question is: why do I have to change the datatype of whole DataFrame?

1 Answer 1

3

try this:

In [12]: df.at[0,'b'] = [1,2,4,5]

In [13]: df
Out[13]:
   a             b
0  1  [1, 2, 4, 5]
1  2           NaN
2  3           NaN
3  4           NaN

PS be aware that as soon as you put non scalar value in any cells - the corresponding column's dtype will be changed to object in order to be able to contain non-scalar values:

In [14]: df.dtypes
Out[14]:
a     int64
b    object
dtype: object

PPS generally it's a bad idea to store non-scalar values in cells, because the vast majority of Pandas/Numpy methods will not work properly with such data.

Sign up to request clarification or add additional context in comments.

5 Comments

This is the first time I stumble upon .at, why not loc?
@iDrwish, .at and .iat are designed for working with single cells, where .loc and .iloc are more complicated and have more logic for aligning data, etc.
Thank you for your answer. But in my original example why do I need to change the data type of whole dataframe, instead of just the column?
@lizardfireman, using .at - you don't need to cast the whole DF to object dtype - see the output of df.dtypes in my answer. In regards to "why df.loc[0,'b'] = [[...]] works when the whole DF is casted to object dtype" - i don't know...
+1, Performance note: at and iat (both very much underused) are much more efficient than .loc based equivalents to access scalars.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.