I'm trying to concatenate two numpy arrays (one float, the other int) horizontally and put it through pandas DataFrame.
So I tried:
from sklearn.datasets import load_iris
iris = pd.DataFrame(np.concatenate((load_iris().data, np.array([load_iris().target]).T), axis=1),
columns=[load_iris().feature_names+['target']])
But this automatically converts the target column into that of float type, from its original int. I tried to convert it back to int by
iris.target = iris.target.astype(int)
But this throws up a TypeError:
TypeError: only integer scalar arrays can be converted to a scalar index
So I have some questions.
(i) What is this error saying?
(ii) Is it even possible to change the type of a single column? (Incidentally, iris = iris.astype(int) works just fine, but this converts every column into that of int type, which isn't something I want.)
(iii) What is the most memory-efficient way to do what I want? The code below produces what I'm trying to do:
iris = pd.concat([pd.DataFrame(load_iris().data, columns = load_iris().feature_names),
pd.DataFrame(load_iris().target, columns=['target'])], axis=1)
But this goes to the trouble of creating multiple pandas DataFrames and concatenating them. Is there a better way to get the exact same output?
np.concatenatedoes make an array with a common dtype.numpyarrays don't have mixed dtypes - unless they arestructured arrays. A dataframe can have a different dtype for each column. Such a frame can be thought of as a collection of pandas Series.df = load_iris()once, That gives a dataframe.df.dataor `df['data'] is a column/series of that frame.