Numpy select by string from pandas DataFrame

Question

I would like to create a new column in my pandas DataFrame based on matching strings. I have pathnames of images that contain either the string 'distorted' or 'original'. I would like to assign the string values 'd' and 'o' in the new column respectively. I have been using np.select but I got a shape-mismatch error.

This is my code:

type_cond = [(df[df['img_name'].str.contains(r'\bdistorted\b')]), (df[df['img_name'].str.contains(r'\boriginal\b')])]

type_values = ['d', 'o']

df['image_type'] = np.select(type_cond, type_values)

When I run the conditions separately, I get the expected output:

distorted = df[df['img_name'].str.contains(r'\bdistorted\b')]

output:

id	n	r	img_name	rid
...
2995	I	2	images/distorted/png/3MRNMEIQW56USS7S1XTZ20C8J...	E
2996	I	3	images/distorted/png/30MVJZJNHMDCUC6BMWCK0PGQO...	E
2997	I	2	images/distorted/png/3MYYFCXHJ37164AYXVVQM4DUA...	E
2998	I	3	images/distorted/png/39RP059MEHTLJDRTND387N3XG...	E
2999	I	1	images/distorted/png/3EKVH9QMEY4OR6LKRRBUN4DZD...	E

[2003 rows x 4 columns]

When filtering the strings that contain 'original' it selects: [997 rows x 4 columns]

The entire data frame is of size: [3000 rows x 4 columns]

I don't see why there is a shape mismatch because all the rows are covered by either condition.

jezrael · Accepted Answer · 2021-05-04 08:54:10Z

1

There is problem in conditions list are filtered DataFrames.

So need remove boolean indexing - (df[]):

type_cond = [df['img_name'].str.contains(r'\bdistorted\b'),
             df['img_name'].str.contains(r'\boriginal\b')]

answered May 4, 2021 at 8:54

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy select by string from pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related