3

Consider you have two lists (or columns in a pandas DataFrame), each containing some null values. You want a single list that replaces the null values in one list with the corresponding non-null values of the other if one exists.

Example:

s1 = [1, NaN, NaN]
s2 = [NaN, NaN, 3]
## some function
result = [1, NaN, 3]

Assume that if both lists are non-null at some position then they match, so we need not worry about resolving conflicts. If so, I know I can solve it with a list comprehension:

[x if ~np.isnan(x) else y for (x,y) in zip(s1,s2)]

or if s1 and s2 are columns in a pandas DataFrame df, then we can use similar logic and the apply function:

df.apply(lambda x: x.s1 if ~np.isnan(x.s1) else x.s2, axis=1)

but is there a cleaner way to do this, perhaps using some of the pandas functionality? What is this kind of operation even called? It is kind of like a union, but preserves ordering and null values when lacking an alternative.

2 Answers 2

1

You can use pandas fillna functionality to fill missing values from other columns.

df = pd.DataFrame([[1,np.nan],[np.nan,np.nan],[np.nan,3]],columns=['c1','c2'])
df['c1'].fillna(df['c2'])
Sign up to request clarification or add additional context in comments.

Comments

0

I had to do this recently. You may have to adapt what I put below depending on the structure of your column values.

import pandas as pd

# example dataframe
df = pd.DataFrame({'col': ['a', 'b', None, 'd', 'e', None, None]})

# null positions and list of values to replace nulls with
nulls = df[pd.isnull(df.col)].index
goodies = ['c', 'f', 'g']

# replace nulls with empty strings
df['col'].fillna('', inplace=True)

# augment empty strings to something we can keep track of
SEP = '_'
df['col'] = df.col + pd.Series([SEP + str(i) for i in df.index])

# create map to turn bad values good and then perform replacement
salvation = {bad: good for bad, good in zip(df.ix[nulls].col, goodies)}
df.replace(salvation, inplace=True)

# remove everything including and after SEP string
df['col'] = df.col.apply(lambda s: s.split(SEP)[0])

Note that in my example the column contained string values, so depending on your data types you should convert to strings using astype() method and then back to what you want when you're done. Also, you may need to change the SEP so you don't split up your values in an unwanted manner in the last line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.