1

I am doing so predictive modeling

as usual

splitting data into x_train, x_test, y_train, y_test

and then getting the test prediction in y_pred

once I finish I want to put data into a csv file

but the problem is when I try to join y_pred to y_test, it does not join as expected.

will get something like this

    Class Data    TotalCnt  0
16  3     2209    5235      
98  3     2190    4871      
07  1     2183    1342      1690
09  1     2205    1540      1540
19  3     2191    4673      
01  1     2206    3117      1005
38  3     2200    4837      
44  3     2219    4965      
04  1     2195    1340      1690
10  1     2191    1980      2002
38  3     2184    4620      
15  3     2220    4781      
18  3     2223    4872      

it deletes some records

I think the cause of the problem is the following

y_pred is predictions for random set from the original dataframe so it should look like this

ID      Prediction
16      1005
98      2056
07      1690
54      1690
...
.
.
.

y_pred is an array, so in order for me to join it with x_test I convert it to dataframe

once y_pred converted to a dataframe, it loses the IDs, so it become sequential 1, 2, 3, 4, ...

ID      Prediction
1       1005
2       2056
3       1690
4       1690
...
.
.
.

therefore when try to join with x_test it only matches the ID numbers that exist in both dataframes x_test and y_pred

How can I get the predictions as dataframe instead of array

I am using this

x_train, x_test, y_train, y_test = train_test_split(x,y)
rf = RandomForestRegressor(n_estimators=10000)
rf.fit(x_train, y_train) 
y_pred = rf.predict(x_test)

. . .
. . .

def Lead0(value):
        return "0" + str(value) if value < 10 else str(value)

dNow = datetime.datetime.now()
sNow = Lead0(dNow.year) + Lead0(dNow.month) + Lead0(dNow.day) + Lead0(dNow.hour) + Lead0(dNow.minute) + Lead0(dNow.second) 

y_pred = pd.DataFrame(y_pred)
y_out = x_test
y_out = y_out.join(y_test)
y_out = y_out.join(y_pred)

y_out.to_csv(sFolder + "dfPred__" + sNow +".csv")

How to join array to dataframe without losing the ID order

How to convert array to dataframe without losing the ID order

1 Answer 1

2

y_pred is predictions for random set from the original dataframe y_pred is an array

I understand you want to keep index from original dataframe

To do this I think you need to make old dataframe index a column, and then keep old dataframe series y_pred as dict or dataframe, not an array.

import pandas as pd
df = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

  Record Type            Value
0         100  (1, 2, 3, 4, 5)
1         200             0,10
2         300                1

Then reset index to column:

df.reset_index(level=0, inplace=True)

   index Record Type            Value
0      0         100  (1, 2, 3, 4, 5)
1      1         200             0,10
2      2         300                1

Now you can keep both index (which is regular series now) and y_pred values from old dataframe and merge it with your new dataframe.

To merge new df with old one use merge:

import pandas as pd

df1 = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

df1.reset_index(level=0, inplace=True)

df2 = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

df2.reset_index(level=0, inplace=True)


# to merge dataframes on column index
df_all = df1.merge(df2, on='index', indicator = True) #indicator show 
                            # if record was found in one df or both

df_all.columns #show column list
df_all = df_all[['index','Record Type_y','Value_y']] #pick only columns you want
Sign up to request clarification or add additional context in comments.

2 Comments

then how to join with the other dataset?
I edited original answer, try to use merge as described. Merge docs: pandas.pydata.org/pandas-docs/stable/reference/api/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.