1

I have a data frame where I have a column with nan values

I filtered them:

X_train = data[np.isnan(data[column]) == False].drop(column, 1)
y_train = data[np.isnan(data[column]) == False][column]
X_test = data[np.isnan(data[column]) == True].drop(column, 1)
y_test = data[np.isnan(data[column]) == True][column]

Then with some complex algorithm I predict y_test values. And then I want to merge these DataFrames with correct order. For example:

X, y
1, 1
12, nan
2, 3
5, nan
7, 34

y_test will have 2 values. For example after algorith is ended y_test == [2, 43]

Then I want to create following DataFrame:

X, y
1, 1
12, 2
2, 3
5, 43
7, 34

2 Answers 2

1

Just assign y_testto the missing values.

df.loc[df['y'].isnull(), 'y'] = y_test
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! Very short and helpful solution
1

You could use

mask = np.isnan(data[column])
data.loc[mask, column] = [2, 43]

to assign the values to the original DataFrame, data:

import numpy as np
import pandas as pd

nan = np.nan
data = pd.DataFrame({'X': [1, 12, 2, 5, 7], 'y': [1.0, nan, 3.0, nan, 34.0]})
column = 'y'
mask = np.isnan(data[column])
X_train = data[~mask].drop(column, axis=1)
y_train = data.loc[~mask, column]
X_test = data[mask].drop(column, axis=1)
y_test = data.loc[mask, column]

data.loc[mask, column] = [2, 43]
print(data)

yields

    X   y
0   1   1
1  12   2
2   2   3
3   5  43
4   7  34

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.