I would like to create a new column in my pandas DataFrame based on matching strings. I have pathnames of images that contain either the string 'distorted' or 'original'. I would like to assign the string values 'd' and 'o' in the new column respectively. I have been using np.select but I got a shape-mismatch error.
This is my code:
type_cond = [(df[df['img_name'].str.contains(r'\bdistorted\b')]), (df[df['img_name'].str.contains(r'\boriginal\b')])]
type_values = ['d', 'o']
df['image_type'] = np.select(type_cond, type_values)
When I run the conditions separately, I get the expected output:
distorted = df[df['img_name'].str.contains(r'\bdistorted\b')]
output:
| id | n | r | img_name | rid |
|---|---|---|---|---|
| ... | ||||
| 2995 | I | 2 | images/distorted/png/3MRNMEIQW56USS7S1XTZ20C8J... | E |
| 2996 | I | 3 | images/distorted/png/30MVJZJNHMDCUC6BMWCK0PGQO... | E |
| 2997 | I | 2 | images/distorted/png/3MYYFCXHJ37164AYXVVQM4DUA... | E |
| 2998 | I | 3 | images/distorted/png/39RP059MEHTLJDRTND387N3XG... | E |
| 2999 | I | 1 | images/distorted/png/3EKVH9QMEY4OR6LKRRBUN4DZD... | E |
[2003 rows x 4 columns]
When filtering the strings that contain 'original' it selects: [997 rows x 4 columns]
The entire data frame is of size: [3000 rows x 4 columns]
I don't see why there is a shape mismatch because all the rows are covered by either condition.