Replace missing values in list from second list using python/pandas

Question

Consider you have two lists (or columns in a pandas DataFrame), each containing some null values. You want a single list that replaces the null values in one list with the corresponding non-null values of the other if one exists.

Example:

s1 = [1, NaN, NaN]
s2 = [NaN, NaN, 3]
## some function
result = [1, NaN, 3]

Assume that if both lists are non-null at some position then they match, so we need not worry about resolving conflicts. If so, I know I can solve it with a list comprehension:

[x if ~np.isnan(x) else y for (x,y) in zip(s1,s2)]

or if s1 and s2 are columns in a pandas DataFrame df, then we can use similar logic and the apply function:

df.apply(lambda x: x.s1 if ~np.isnan(x.s1) else x.s2, axis=1)

but is there a cleaner way to do this, perhaps using some of the pandas functionality? What is this kind of operation even called? It is kind of like a union, but preserves ordering and null values when lacking an alternative.

evilpilotfish · Accepted Answer · 2017-02-09 16:36:36Z

1

You can use pandas fillna functionality to fill missing values from other columns.

df = pd.DataFrame([[1,np.nan],[np.nan,np.nan],[np.nan,3]],columns=['c1','c2'])
df['c1'].fillna(df['c2'])

answered Feb 9, 2017 at 16:36

evilpilotfish

1632 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cpsempek · Accepted Answer · 2016-02-24 19:48:40Z

I had to do this recently. You may have to adapt what I put below depending on the structure of your column values.

import pandas as pd

# example dataframe
df = pd.DataFrame({'col': ['a', 'b', None, 'd', 'e', None, None]})

# null positions and list of values to replace nulls with
nulls = df[pd.isnull(df.col)].index
goodies = ['c', 'f', 'g']

# replace nulls with empty strings
df['col'].fillna('', inplace=True)

# augment empty strings to something we can keep track of
SEP = '_'
df['col'] = df.col + pd.Series([SEP + str(i) for i in df.index])

# create map to turn bad values good and then perform replacement
salvation = {bad: good for bad, good in zip(df.ix[nulls].col, goodies)}
df.replace(salvation, inplace=True)

# remove everything including and after SEP string
df['col'] = df.col.apply(lambda s: s.split(SEP)[0])

Note that in my example the column contained string values, so depending on your data types you should convert to strings using astype() method and then back to what you want when you're done. Also, you may need to change the SEP so you don't split up your values in an unwanted manner in the last line.

Collectives™ on Stack Overflow

Replace missing values in list from second list using python/pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related