When trying to implement OneHotEncoding into my machine learning project, I am using the following code to encode my 3 category features (job, marital status & education)
encoder = OneHotEncoder(categories = 'auto')
feature_array = encoder.fit_transform(df[['job', 'marital', 'education']]).toarray()
feature_labels = encoder.categories_
This returns the categories for each of the 3 features into 3 different arrays captured in a list.
[array(['admin.', 'blue-collar', 'management', 'retired', 'self-employed',
'services', 'student', 'technician', 'unemployed', 'unknown'],
dtype=object),
array(['divorced', 'married', 'single'], dtype=object),
array(['primary', 'secondary', 'tertiary', 'unknown'], dtype=object)]
I understand that using a for loop through this list can return 3 lists containing the labels for all 3 features,
for value in feature_labels:
print(value)
['admin.' 'blue-collar' 'management' 'retired' 'self-employed' 'services'
'student' 'technician' 'unemployed' 'unknown']
['divorced' 'married' 'single']
['primary' 'secondary' 'tertiary' 'unknown']
That being said, is there a more elegant or one liner that I can incorporate to create a list containing all the various categories for my 3 features? In the end, I'd love to have a single list that looks the one below so I can pipe in all 3 encoded features into a single dataframe,
['admin.', 'blue-collar', 'management', 'retired', 'self-employed', 'services', 'student' ,'technician', 'unemployed', 'unknown', 'divorced', 'married', 'single', 'primary', 'secondary', 'tertiary', 'unknown']