0

When trying to implement OneHotEncoding into my machine learning project, I am using the following code to encode my 3 category features (job, marital status & education)

encoder = OneHotEncoder(categories = 'auto')
feature_array = encoder.fit_transform(df[['job', 'marital', 'education']]).toarray()
feature_labels = encoder.categories_

This returns the categories for each of the 3 features into 3 different arrays captured in a list.

[array(['admin.', 'blue-collar', 'management', 'retired', 'self-employed',
        'services', 'student', 'technician', 'unemployed', 'unknown'],
       dtype=object),
 array(['divorced', 'married', 'single'], dtype=object),
 array(['primary', 'secondary', 'tertiary', 'unknown'], dtype=object)]

I understand that using a for loop through this list can return 3 lists containing the labels for all 3 features,

for value in feature_labels:
    print(value)

['admin.' 'blue-collar' 'management' 'retired' 'self-employed' 'services'
 'student' 'technician' 'unemployed' 'unknown']
['divorced' 'married' 'single']
['primary' 'secondary' 'tertiary' 'unknown']

That being said, is there a more elegant or one liner that I can incorporate to create a list containing all the various categories for my 3 features? In the end, I'd love to have a single list that looks the one below so I can pipe in all 3 encoded features into a single dataframe,

['admin.', 'blue-collar', 'management', 'retired', 'self-employed', 'services', 'student' ,'technician', 'unemployed', 'unknown', 'divorced', 'married', 'single', 'primary', 'secondary', 'tertiary', 'unknown']
1

3 Answers 3

1

You can use numpy's concatenate to join your 3 arrays: (https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)

labels = np.concatenate(feature_labels)

# The result:
array(['admin.', 'blue-collar', 'management', 'retired', 'self-employed',
       'services', 'student', 'technician', 'unemployed', 'unknown',
       'divorced', 'married', 'single', 'primary', 'secondary',
       'tertiary', 'unknown'], dtype=object)
Sign up to request clarification or add additional context in comments.

Comments

0

If You have nested list:

l = [['admin.', 'blue-collar', 'management', 'retired', 'self-employed','services', 'student', 'technician', 'unemployed', 'unknown'],\
['divorced', 'married', 'single'], ['primary', 'secondary', 'tertiary', 'unknown']]

one of method to unnest it is:

import itertools

flat_l  = list(itertools.chain(*l))

result:

['admin.',
 'blue-collar',
 'management',
 'retired',
 'self-employed',
 'services',
 'student',
 'technician',
 'unemployed',
 'unknown',
 'divorced',
 'married',
 'single',
 'primary',
 'secondary',
 'tertiary',
 'unknown']

Comments

0

Since you have a list of numpy arrays you could also use:

import numpy as np

l = list(np.concatenate(feature_labels))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.