0

I have a pandas DataFrame from the sklearn.datasets Boston house price data and am trying to convert this to a numpy array but keeping column names. Here is the code I tried:

from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd

data = datasets.load_boston() ## loads Boston dataset from datasets library

df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
print(X.dtype.names)

However this returns None and therefore column names are not kept. Does anyone understand why?

Thanks

4
  • 1
    why do you expect column names should be retained when you access an underlying arrays instead of a dataframe? You can store the column names as a dictionary/array if you want access to them later Commented May 5, 2020 at 18:24
  • I assumed the code would create a structured array from the pandas DataFrame. I followed this answer to get there:stackoverflow.com/questions/7561017/… Commented May 5, 2020 at 18:25
  • @geds133 No, the corresponding method is to_records. to_numpy doesn't yield a structured array. Commented May 5, 2020 at 18:28
  • I see, there is a question on Stack that suggests this is the case. I shall comment and ask for correction. Many Thanks Commented May 5, 2020 at 18:30

1 Answer 1

0

try this :

w = (data.feature_names).reshape(13,1)
X = np.vstack((w.T, data.data))
print (X)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.