Concatenating noneType and string value columns (pandas dataframes) results in "NaN"

Question

I'm trying to concatenate two columns and the second column has a few noneType values. When I try to concatenate both the columns with the noneType values, the resulting column results in "NaN".

I tried to look around to see if I could find questions on this behavior, but I wasn't able to.

Here's what the table looked before concatenation:

Here's my code to join the two columns after my modifications:

new_table["name"] = new_table[0] + new_table[1]

Which results in this:

Why is does concatenation result in "NaN" and how can I fix it?

Please share code and data as text in the post itself, not as images. See: meta.stackoverflow.com/q/303812/11301900. Have you read the Pandas docs? and how can I fix it? There's nothing to fix, NaN has a purpose. Speaking of, why were you using None in the DataFrame? — AMC
– AMC, Commented Jan 6, 2020 at 1:37
Noted for the images. I have read the pandas docs here: pandas.pydata.org/pandas-docs/stable/getting_started/…, but I couldn't find an explanation for this behavior. NaN has a function, but is unclear why the string object '+' noneType results in NaN. Since the result is NaN, as opposed to just the data from column one as a summing operation would suggest, I wanted to know how to "fix" it. — exlo
– exlo, Commented Jan 6, 2020 at 3:45

ypnos · Accepted Answer · 2020-01-06 00:45:05Z

2

The most simple fix would be to replace None with empty string:

new_table["name"] = new_table[0] + new_table[1].fillna('')

answered Jan 6, 2020 at 0:45

ypnos

53k14 gold badges104 silver badges151 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

exlo Over a year ago

Thank you, this solves the problem. But is there a reason why concatenation between noneTypes and strings works this way?

ypnos Over a year ago

Python does not allow concatenation between str and any other types (including NoneType). Pandas catches the TypeError and falls back to NaN. The burden is on you to tell Pandas how to deal with missing values, to avoid silent failure.

Ponx · Accepted Answer · 2020-01-06 18:38:13Z

2

df = pd.DataFrame([["K.", "Mbappe"], ["N.", np.nan]])
print (df)

Output:

    0       1  
0  K.  Mbappe  
1  N.     NaN  


df['Name'] = df[0].str.cat(df[1], na_rep='')
print(df)

Output:

    0       1      Name
0  K.  Mbappe  K.Mbappe
1  N.     NaN        N.

It is the same approach as ypnos proposed, using Series str.cat function instead.

edited Jan 6, 2020 at 18:38

answered Jan 6, 2020 at 1:05

Ponx

971 silver badge10 bronze badges

2 Comments

ypnos Over a year ago

A helpful addition to my answer, as str.cat is more powerful than plain str concatenation.

exlo Over a year ago

Thank you @Ponx. Is there a reason str.cat is more powerful?

Collectives™ on Stack Overflow

Concatenating noneType and string value columns (pandas dataframes) results in "NaN"

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related