1

I would like to create a new column in my pandas DataFrame based on matching strings. I have pathnames of images that contain either the string 'distorted' or 'original'. I would like to assign the string values 'd' and 'o' in the new column respectively. I have been using np.select but I got a shape-mismatch error.

This is my code:

type_cond = [(df[df['img_name'].str.contains(r'\bdistorted\b')]), (df[df['img_name'].str.contains(r'\boriginal\b')])]

type_values = ['d', 'o']

df['image_type'] = np.select(type_cond, type_values)

When I run the conditions separately, I get the expected output:

distorted = df[df['img_name'].str.contains(r'\bdistorted\b')]

output:

id n r img_name rid
...
2995 I 2 images/distorted/png/3MRNMEIQW56USS7S1XTZ20C8J... E
2996 I 3 images/distorted/png/30MVJZJNHMDCUC6BMWCK0PGQO... E
2997 I 2 images/distorted/png/3MYYFCXHJ37164AYXVVQM4DUA... E
2998 I 3 images/distorted/png/39RP059MEHTLJDRTND387N3XG... E
2999 I 1 images/distorted/png/3EKVH9QMEY4OR6LKRRBUN4DZD... E

[2003 rows x 4 columns]

When filtering the strings that contain 'original' it selects: [997 rows x 4 columns]

The entire data frame is of size: [3000 rows x 4 columns]

I don't see why there is a shape mismatch because all the rows are covered by either condition.

1 Answer 1

1

There is problem in conditions list are filtered DataFrames.

So need remove boolean indexing - (df[]):

type_cond = [df['img_name'].str.contains(r'\bdistorted\b'),
             df['img_name'].str.contains(r'\boriginal\b')]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.