I want to drop multiple columns(around 800) from the dataframe using python. I have written below code:
def corr_df(x, corr_val):
# Creates Correlation Matrix and Instantiates
corr_matrix = x.corr()
iters = range(len(corr_matrix.columns) - 1)
drop_cols = []
df_drop=pd.DataFrame()
cols=[]
# Iterates through Correlation Matrix Table to find correlated columns
for i in iters:
for j in range(i):
item = corr_matrix.iloc[j:(j+1), (i+1):(i+2)]
col = item.columns
row = item.index
val = item.values
if val >= corr_val:
# Prints the correlated feature set and the corr val
#print(col.values[0], "|", row.values[0], "|", round(val[0][0], 2))
drop_cols.append(i)
drops = sorted(set(drop_cols))[::-1]
df_dropped=x.drop(drops,axis=1)
# Drops the correlated columns
# for i in drops:
# col=(x.iloc[:, (i+1):(i+2)].columns.values.tolist())
# print (col)
# df_dropped=df.drop(col, axis=1)
#cols.append()
#print(df_dropped)
return (df_dropped)
But this code is printing the dataframe have only one column dropped. Any comments or suggestions on this?
Thanks in advance
df = df.drop(lst, axis=1), wherelstis your list of columns. Is there a reason why you need to perform this one at a time iteratively?df.drop(drops,axis=1). GettingValueError: labels [1069 1068 1067 ..., 3 2 1] not contained in axis.