count occurrences matching partial string by column in pandas python

Question

new_data is a pandas dataframe with 4 columns and:

If I want to get a count of occurrences for an exact matching by column I do this:

new_data[new_data == 'blank'].count()

Output:

A          0
B          0
C          0
D          2654

What if I want a partial match for the string 'bla', would be something like this:

new_data[new_data in 'bla'].count()

But of course that does not work. What is the right way to do it?

new_data.str.contains('bla')?

yatu
– yatu

2020-02-25 14:03:09 +00:00
Commented Feb 25, 2020 at 14:03 — yatu
– yatu, Commented Feb 25, 2020 at 14:03

jezrael · Accepted Answer · 2020-02-25 14:15:14Z

2

Use DataFrame.apply and Series.str.contains with sum for count Trues:

np.random.seed(1234)

new_data = pd.DataFrame(np.random.choice(['a blas', 's'], size=(2,4)), columns=list('ABCD'))
print (new_data)
        A       B       C  D
0       s       s  a blas  s
1  a blas  a blas  a blas  s

print (new_data.apply(lambda x: x.str.contains('bla')).sum())
A    1
B    1
C    2
D    0
dtype: int64

Your solution:

print (new_data[new_data.apply(lambda x: x.str.contains('bla'))].count())
A    1
B    1
C    2
D    0
dtype: int64

edited Feb 25, 2020 at 14:15

answered Feb 25, 2020 at 14:03

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Miguel Santos Over a year ago

Remember that new_data is a pandas dataframe, so it does not have the 'str' attribute

jezrael Over a year ago

@MiguelSantos - Added col for check column by name

Miguel Santos Over a year ago

I edited the question, was not clear, I want it for every column

Miguel Santos Over a year ago

do you know any solution that might be more clean and general? Because this will not work if you have different types of columns for exemple integers. And the first example with a full match comparison will work for everything

jezrael Over a year ago

@MiguelSantos - Unfortunately only (new_data.astype(str).apply(lambda x: x.str.contains('bla')).sum()) - with convert to strings

Collectives™ on Stack Overflow

count occurrences matching partial string by column in pandas python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related