3

I created a dataframe df as below:

Type = ['A', 'B', 'C', 'D']
Size = [72,23,66,12]
df = pd.DataFrame({'Type': Type, 'Size': Size})

I can extract any existing column using:

df_count = df['Size']

However, if a data frame is too big, and I don't know if the column exist in df or not. In such if I call a column e.g. df['Shape'] as below:

df_null = df['Shape']

It raises a KeyError. However I want that df_null should get an empty column with name "Shape".

2 Answers 2

17

Use DataFrame.get in a pattern similar to:

In [3]: df.get('Size', pd.Series(index=df.index, name='Size'))
Out[3]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [4]: df.get('Shape', pd.Series(index=df.index, name='Shape'))
Out[4]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64

Or generalize by creating a function to abstract this:

In [5]: get_column = lambda df, col: df.get(col, pd.Series(index=df.index, name=col))

In [6]: get_column(df, 'Size')
Out[6]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [7]: get_column(df, 'Shape')
Out[7]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64

Another alternative could be to use reindex and squeeze:

In [8]: df.reindex(columns=['Size']).squeeze()
Out[8]:
0    72
1    23
2    66
3    12
Name: Size, dtype: int64

In [9]: df.reindex(columns=['Shape']).squeeze()
Out[9]:
0   NaN
1   NaN
2   NaN
3   NaN
Name: Shape, dtype: float64
Sign up to request clarification or add additional context in comments.

3 Comments

@root Nice! get Awesome. I have never seen that used until now. I will add that to my toolbox. +1
Would it be possible to fill the missing column with "1" instead of "NaN"?
@Eagle After the squeeze, please do this ``` df['Shape'] = df['Shape'].fillna(1) ```.
3

IIUC, try this

col = 'Shape'
df_null = pd.Series() if col not in df.columns else df[col]

Output:

Series([], dtype: float64)

OR

col = 'Size'
df_null = pd.Series() if col not in df.columns else df[col]

Output:

0    72
1    23
2    66
3    12
Name: Size, dtype: int64

2 Comments

Nice usage of condition here:-)
Sorry, a minor addition is required. The df_null should be of same no. of rows as df and fill all rows with 0 or NaN.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.