Join array to dataframe in python

Question

I am doing so predictive modeling

as usual

splitting data into x_train, x_test, y_train, y_test

and then getting the test prediction in y_pred

once I finish I want to put data into a csv file

but the problem is when I try to join y_pred to y_test, it does not join as expected.

will get something like this

    Class Data    TotalCnt  0
16  3     2209    5235      
98  3     2190    4871      
07  1     2183    1342      1690
09  1     2205    1540      1540
19  3     2191    4673      
01  1     2206    3117      1005
38  3     2200    4837      
44  3     2219    4965      
04  1     2195    1340      1690
10  1     2191    1980      2002
38  3     2184    4620      
15  3     2220    4781      
18  3     2223    4872

it deletes some records

I think the cause of the problem is the following

y_pred is predictions for random set from the original dataframe so it should look like this

ID      Prediction
16      1005
98      2056
07      1690
54      1690
...
.
.
.

y_pred is an array, so in order for me to join it with x_test I convert it to dataframe

once y_pred converted to a dataframe, it loses the IDs, so it become sequential 1, 2, 3, 4, ...

ID      Prediction
1       1005
2       2056
3       1690
4       1690
...
.
.
.

therefore when try to join with x_test it only matches the ID numbers that exist in both dataframes x_test and y_pred

How can I get the predictions as dataframe instead of array

I am using this

x_train, x_test, y_train, y_test = train_test_split(x,y)
rf = RandomForestRegressor(n_estimators=10000)
rf.fit(x_train, y_train) 
y_pred = rf.predict(x_test)

. . .
. . .

def Lead0(value):
        return "0" + str(value) if value < 10 else str(value)

dNow = datetime.datetime.now()
sNow = Lead0(dNow.year) + Lead0(dNow.month) + Lead0(dNow.day) + Lead0(dNow.hour) + Lead0(dNow.minute) + Lead0(dNow.second) 

y_pred = pd.DataFrame(y_pred)
y_out = x_test
y_out = y_out.join(y_test)
y_out = y_out.join(y_pred)

y_out.to_csv(sFolder + "dfPred__" + sNow +".csv")

How to join array to dataframe without losing the ID order

How to convert array to dataframe without losing the ID order

Marek · Accepted Answer · 2019-06-20 08:11:44Z

2

y_pred is predictions for random set from the original dataframe y_pred is an array

I understand you want to keep index from original dataframe

To do this I think you need to make old dataframe index a column, and then keep old dataframe series y_pred as dict or dataframe, not an array.

import pandas as pd
df = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

  Record Type            Value
0         100  (1, 2, 3, 4, 5)
1         200             0,10
2         300                1

Then reset index to column:

df.reset_index(level=0, inplace=True)

   index Record Type            Value
0      0         100  (1, 2, 3, 4, 5)
1      1         200             0,10
2      2         300                1

Now you can keep both index (which is regular series now) and y_pred values from old dataframe and merge it with your new dataframe.

To merge new df with old one use merge:

import pandas as pd

df1 = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

df1.reset_index(level=0, inplace=True)

df2 = pd.DataFrame({'Record Type': ['100', '200', '300'],
           'Value': [(1,2,3,4,5), '0,10', 1]})

df2.reset_index(level=0, inplace=True)


# to merge dataframes on column index
df_all = df1.merge(df2, on='index', indicator = True) #indicator show 
                            # if record was found in one df or both

df_all.columns #show column list
df_all = df_all[['index','Record Type_y','Value_y']] #pick only columns you want

edited Jun 20, 2019 at 8:11

answered Jun 20, 2019 at 6:52

Marek

611 gold badge1 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

asmgx Over a year ago

then how to join with the other dataset?

Marek Over a year ago

I edited original answer, try to use merge as described. Merge docs: pandas.pydata.org/pandas-docs/stable/reference/api/…

Collectives™ on Stack Overflow

Join array to dataframe in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related