2

I have a pandas dataframe df with multiple columns. One of the columns is Col1 which contains float values or NaNs:

df
+----+------+-----+
| No | Col1 | ... |
+----+------+-----+
| 12 |   10 | ... |
| 23 |  NaN | ... |
| 34 |    5 | ... |
| 45 |  NaN | ... |
| 54 |   22 | ... |
+----+------+-----+

I run a function over Col1 excluding missing values (NaN) like this:

StandardScaler().fit_transform(df.loc[pd.notnull(df[Col1]), [Col1]])

Imagine the result is a numpy.ndarray like this:

+-----+
| Ref |
+-----+
|   2 |
|   5 |
|   1 |
+-----+

Notice that this array does not have same length than the original column Col1.

I need a solution to add the array Ref as a column to df. For each row where Col1 is NaN, the new column Ref gets NaN too. Desired output would look like this:

+----+------+-----+-----+
| No | Col1 | ... | Ref |
+----+------+-----+-----+
| 12 |   10 | ... |   2 |
| 23 |  NaN | ... | NaN |
| 34 |    5 | ... |   5 |
| 45 |  NaN | ... | NaN |
| 54 |   22 | ... |   1 |
+----+------+-----+-----+

1 Answer 1

4

I think you can assign to new column filtered by same boolean mask:

from sklearn.preprocessing import StandardScaler

mask = df['Col1'].notnull()
df.loc[mask, 'Ref'] = StandardScaler().fit_transform(df.loc[mask, ['Col1']])
print (df)
   No  Col1       Ref
0  12  10.0 -0.327089
1  23   NaN       NaN
2  34   5.0 -1.027992
3  45   NaN       NaN
4  54  22.0  1.355081

Detail:

print (StandardScaler().fit_transform(df.loc[mask, ['Col1']]))
[[-0.32708852]
 [-1.02799249]
 [ 1.35508101]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.