2

I have the following problem: I have two columns in my Dataframe in Python. The first one has names in it (string), the second one an integer, which codes the names. The code dissolves spelling variants. The problem is, that not all names are coded. I would like to make a third column, which has the clear name in it, when the second row is NaN and the code (as string) when there is a code.

Here is an example of the DataFrame:

import pandas as pd
df = pd.DataFrame([['Meyer', 2], ['Mueller', 4], ['Radisch', math.nan], ['Meyer', 2],['Pavlenko', math.nan]])

and here one, how I would like to have it:

df = pd.DataFrame([['Meyer', 2, '2'], ['Mueller', 4, '4'], ['Radisch',math.nan ,'Radisch'], ['Meyer', 2, '2'],['Pavlenko',math.nan ,'Pavlenko']])

Any suggestions how I can do that? I tried a for loop, but it does not work:

for d in range(0, len(df)):
    if not (math.isnan(df['ref'][d])):
        df.ix[d]['name2'] = df.ix[d]['ref']

1 Answer 1

1

you can use fillna() method:

In [26]: df[2] = df[1].fillna(df[0])

In [27]: df
Out[27]:
          0    1         2
0     Meyer  2.0         2
1   Mueller  4.0         4
2   Radisch  NaN   Radisch
3     Meyer  2.0         2
4  Pavlenko  NaN  Pavlenko

or Series.combine_first() method:

In [28]: df[1].combine_first(df[0])
Out[28]:
0           2
1           4
2     Radisch
3           2
4    Pavlenko
Name: 1, dtype: object

Another great resource for reading/learning - Pandas: Working with missing data

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.