2
import pandas as pd
  
# list of paragraphs from judicial opinions
# rows are opinions
# columns are paragraphs from the opinion
opinion1 = ['sentenced to life','sentenced to death. The sentence ...','', 'sentencing Appellant for a term of life imprisonment']
opinion2 = ['Justice Smith','This concerns a sentencing hearing.', 'The third sentence read ...', 'Defendant rested.']
opinion3 = ['sentence sentencing sentenced','New matters ...', 'The clear weight of the evidence', 'A death sentence']
data = [opinion1, opinion2, opinion3]
df = pd.DataFrame(data, columns = ['p1','p2','p3','p4'])

# This works for one column. I have 300+ in the real data set.
df['p2'].str.contains('sentenc')

How do I determine whether 'sentenc' is in columns 'p1' through 'p4'?

Desired output would be something like:

True True False True
False True True False
True False False True

How do I retrieve a count of the number of times that 'sentenc' appears in each cell?

Desired output would be a count for each cell of the number of times 'sentenc' appears:

1 2 0 1
0 1 1 0
3 0 0 1

Thank you!

1 Answer 1

3

Use pd.Series.str.count:

counts = df.apply(lambda col: col.str.count('sentenc'))

Output:

>>> counts
   p1  p2  p3  p4
0   1   2   0   1
1   0   1   1   0
2   3   0   0   1

To get it in boolean form, use .str.contains, or call .astype(bool) with the code above:

bools = df.apply(lambda col: col.str.contains('sentenc'))

or

bools = df.apply(lambda col: col.str.count('sentenc')).astype(bool)

Both will work just fine.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.