3

Is there a convenient way of filling na values with (the first) values of an array or column?

Imagine the following DataFrame:

dfcolors = pd.DataFrame({'Colors': ['Blue', 'Red', np.nan, 'Green', np.nan, np.nan, 'Brown']})

  Colors
0   Blue
1    Red
2    NaN
3  Green
4    NaN
5    NaN
6  Brown

I want to fill the NaN values with values from another DataFrame, or array, so:

dfalt = pd.DataFrame({'Alt': ['Cyan', 'Pink']})

           Alt
0         Cyan
1         Pink

When there are more NaN's then fill values some NaN's should remain. And when there are more fill values, not all of them will be used. So we'll have to do some counting:

n_missing = len(dfcolors) - dfcolors.count().values[0]    
n_fill = min(n_missing, len(dfalt))

The number n_fill is the amount of values that can be filled.

Selecting the NaN values which can/should be filled can be done with:

dfcolors.Colors[pd.isnull(dfcolors.Colors)][:n_fill]

2    NaN
4    NaN
Name: Colors, dtype: object

Selecting the fill values

dfalt.Alt[:n_fill]

0    Cyan
1    Pink
Name: Alt, dtype: object

And them i'm stuck at something like:

dfcolors.Colors[pd.isnull(dfcolors.Colors)][:n_fill] = dfalt.Alt[:n_fill]

Which doesn't work... Any tips would be great.

This is the output that i want:

  Colors
0   Blue
1    Red
2   Cyan
3  Green
4   Pink
5    NaN
6  Brown

NaN values are filled from top to bottom, and the fill values are also selected from top to bottom if there are more fill values than NaN's

4
  • 1
    What is the output you want? Commented Jul 9, 2013 at 9:36
  • Good point, i edited the question a bit. Commented Jul 9, 2013 at 9:55
  • It's returning view vs copy (fancy indexing always returns a copy)... hmm Commented Jul 9, 2013 at 10:00
  • Yes I think thats a main issue, I have tried all kinds of things like adding .values or even wrapping it in a new DataFrame. No luck so far. Commented Jul 9, 2013 at 10:03

2 Answers 2

3

This is rather awful, but iterating over the index of the nulls works:

In [11]: nulls = dfcolors[pd.isnull(dfcolors['Colors'])]

In [12]: for i, ni in enumerate(nulls.index[:len(dfalt)]):
             dfcolors['Colors'].loc[ni] = dfalt['Alt'].iloc[i]

In [13]: dfcolors
Out[13]:
  Colors
0   Blue
1    Red
2   Cyan
3  Green
4   Pink
5    NaN
6  Brown
Sign up to request clarification or add additional context in comments.

Comments

3

You could use a generator. That way you could write something like this:

import pandas as pd
from pandas import np

dfcolors = pd.DataFrame({'Colors': ['Blue', 'Red', np.nan, 'Green', np.nan, np.nan, 'Brown']})
dfalt = pd.DataFrame({'Alt': ['Cyan', 'Pink']})

gen_alt = (alt for alt in dfalt.Alt)

for i, color in enumerate(dfcolors.Colors):
    if not pd.isnull(color): continue
    try:
        dfcolors.Colors[i] = gen_alt.next()
    except StopIteration:
        break
print(dfcolors)
#     Colors
# 0   Blue
# 1    Red
# 2   Cyan
# 3  Green
# 4   Pink
# 5    NaN
# 6  Brown

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.