0

Two dataframes train and test. The train dataframe has columns A, Z, D, C and test has columns C, D, Z. How do I take train's columns and make test dataframe also has four columns with the same order as train? The newly added columns A should have value 0 for all rows.

Thanks.

3 Answers 3

2

You can use assign to create a new column A on test and use list of column names to reorder the columns:

test.assign(A = 0)[['A', 'Z', 'D', 'C']]

Or: test.assign(A = 0)[train.columns]


test = pd.DataFrame({
    'C': [1,2,3],
    'D': [2,3,4],
    'Z': [3,4,5]
})

test.assign(A = 0)[['A','Z','D','C']]

#   A   Z   D   C
#0  0   3   2   1
#1  0   4   3   2
#2  0   5   4   3
Sign up to request clarification or add additional context in comments.

Comments

1

Or using reindex

test['A']=0
test.reindex_axis(train.columns, axis=1)

Comments

1

It is also possible to use pd.concat as follows.

import pandas as pd
import numpy as np

df_train = pd.DataFrame({
    "A": np.random.randn(10),
    "Z": np.random.randn(10),
    "D": np.random.randn(10),
    "C": np.random.randn(10)
})

df_test = pd.DataFrame({
    "Z": np.random.randn(10),
    "D": np.random.randn(10),
    "C": np.random.randn(10)
})

df_all = pd.concat([df_train, df_test], axis=0)
df_all = df_all.fillna(0)

df_train = df_all.iloc[0:len(df_train.index)]
df_test = df_all.iloc[len(df_train.index)+1:len(df_all.index)]

2 Comments

Or may be change to pd.concat([train,test],axis=0,keys=['train','test']).loc['test'].fillna(0), still a nice new way for this type of question :~)
Thanks for the supplement. I was helpful too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.