2

new_data is a pandas dataframe with 4 columns and:

If I want to get a count of occurrences for an exact matching by column I do this:

new_data[new_data == 'blank'].count()

Output:

A          0
B          0
C          0
D          2654

What if I want a partial match for the string 'bla', would be something like this:

new_data[new_data in 'bla'].count()

But of course that does not work. What is the right way to do it?

1
  • 1
    new_data.str.contains('bla')? Commented Feb 25, 2020 at 14:03

1 Answer 1

2

Use DataFrame.apply and Series.str.contains with sum for count Trues:

np.random.seed(1234)

new_data = pd.DataFrame(np.random.choice(['a blas', 's'], size=(2,4)), columns=list('ABCD'))
print (new_data)
        A       B       C  D
0       s       s  a blas  s
1  a blas  a blas  a blas  s

print (new_data.apply(lambda x: x.str.contains('bla')).sum())
A    1
B    1
C    2
D    0
dtype: int64

Your solution:

print (new_data[new_data.apply(lambda x: x.str.contains('bla'))].count())
A    1
B    1
C    2
D    0
dtype: int64
Sign up to request clarification or add additional context in comments.

5 Comments

Remember that new_data is a pandas dataframe, so it does not have the 'str' attribute
@MiguelSantos - Added col for check column by name
I edited the question, was not clear, I want it for every column
do you know any solution that might be more clean and general? Because this will not work if you have different types of columns for exemple integers. And the first example with a full match comparison will work for everything
@MiguelSantos - Unfortunately only (new_data.astype(str).apply(lambda x: x.str.contains('bla')).sum()) - with convert to strings

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.