6

I'm trying to create a function to remove the features that are highly correlated with each other. However, I am getting the error ''AttributeError: 'numpy.ndarray' object has no attribute 'columns' '' ...

I just want to call pandas to read columns number. What can I do next?

import pandas as pd
import numpy as np

def remove_features_identical(DataFrame,data_source):
    n=len(DataFrame.columns)
    print 'dealing with %d features of %s data......... \n' % (n,data_source)
    remove_ind = []
    R = np.corrcoef(DataFrame.T)
    for i in range(n-1):
        for j in range(i+1,n):
            if R[i,j]==1:
                remove_ind.append(j)    

    DataFrame.drop(remove_ind, axis=1, inplace=True)
    DataFrame.drop(remove_ind, axis=1, inplace=True)
    print ('deleting %d columns with correration factor >0.99') % ( len(remove_ind))
    return DataFrame

if __name__ == "__main__":
    # load data and initialize y and x from train set and test set
    df_train = pd.read_csv('train.csv')
    df_test = pd.read_csv('test.csv')
    y_train=df_train['TARGET'].values
    X_train =df_train.drop(['ID','TARGET'], axis=1).values
    y_test=[]
    X_test = df_test.drop(['ID'], axis=1).values

    # delete identical feartures in raw data
    X_train = remove_features_identical(X_train,'train set')
    X_test = remove_features_identical(X_test,'test set')

3 Answers 3

8

Check the Pandas documentation, but I think

X_train = df_train.drop(['ID','TARGET'], axis=1).values

.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.

remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.

Sign up to request clarification or add additional context in comments.

Comments

1

Maybe this solution solve such problem, try this:

X_train = pd.DataFrame(X_train, columns = X.columns)

X_test = pd.DataFrame(X_test, columns=X.columns)

Comments

0

Check what were the names of our columns

unscaled_inputs.columns.values

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.