0

I'm trying to select a set of columns from a dataset that meets a certain condition. So for example I have these columns in the dataframe:

['string_category_4', 'string_category_24', 'range_category_6',
  'range_category_17', 'int_numeric_21', 'string_category_15',
  'float_numeric_8', 'int_bool_19', 'int_bool_2']

Each of the trailing numbers in the column name is going to be a unique identifier, regardless of the prefix like category or numeric. I would like to select all the columns which are <=10 so it should return a list for me that is:

['string_category_4', 'range_category_6', 'float_numeric_8',
 'int_bool_2']

Is there a way with either string processing or even simpler way to do this?

1 Answer 1

2

Solution:

columns = [
    'category_4', 'category_24', 'category_6',
    'category_17', 'numeric_21', 'category_15',
    'numeric_8', 'bool_19', 'bool_2'
]
filtered_columns = [col for col in columns if int(col.rsplit('_')[-1]) <= 10]
print(filtered_columns)

Output:

> ['category_4', 'category_6', 'numeric_8', 'bool_2']
Sign up to request clarification or add additional context in comments.

2 Comments

This logic is sound, but how would you handle columns with names such as string_category_11 for example which have two underscores?
That's what the [-1] is for (selecting the last item). Split the string, then select the last item only. BTW, you could use .split("_") as well. Even better is .rsplit(''_", 1). "string_category_11".rsplit("_") yields ["string", "category", "11"]. By using [-1], you are selecting the last item in that list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.