Python count string (word) in column of a dataframe

Question

i have the following dataframe (df_hvl) with the columnname "FzListe" and the following data:

FzListe
7MA1, 7OS1
7MA1, 7ZJB
7MA2, 7MA3, 7OS1
76G1, 7MA1, 7OS1
7MA1, 7OS1
71E5, 71E6, 7MA1, FSS1
71E4, 7MA1, 7MB1, 7OS1
71E6, 7MA1, 7OS1
7MA1
7MA1, 7MB1, 7OS1
7MA1
7MA1, 7MA2, 7OS1
04, 7MA1
76G1, 7MA1, 7OS1
76G1, 7MA1, 7OS1
7MA1, 7OS1
7MA1
76G1, 7MA1, 7OS1
76G1, 7MA1, 7OS1
71E6, 7MA1
7MA1, 7MA2, 7OS1
7MA1
7MA1
7MA1
7MA1, 7OS1
76G1, 7MA1

I want to search for the string "7MA" only and count how often it appears in the list. (The list is originally much longer than that snippet). I want not to search only for 7MA1 because its possible that in one line it appears also with 7MA2 and/or 7MA3 and so on...

The Dataframe is called df_hvl and i searched for a solution but didnt find one.

a counting how often 7MA appears in the column (including 7MA1, 7MA2, 7MA3 and so on) — Damian
– Damian, Commented Feb 28, 2017 at 10:11

jezrael · Accepted Answer · 2017-02-28 10:15:59Z

12

I think you need str.count with sum:

print (df_hvl.FzListe.str.count(substr))
0     1
1     1
2     2
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    2
12    1
13    1
14    1
15    1
16    1
17    1
18    1
19    1
20    2
21    1
22    1
23    1
24    1
25    1
Name: FzListe, dtype: int64

substr = '7MA'
print (df_hvl.FzListe.str.count(substr).sum())
29

answered Feb 28, 2017 at 10:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mayeul sgc Over a year ago

nice it is really elegant

Nouh ABA · Accepted Answer · 2020-05-19 01:10:39Z

2

this will work too probably

df_hvl.FzListe.map(lambda d: "7MA" in d).sum()

answered May 19, 2020 at 1:10

Nouh ABA

212 bronze badges

1 Comment

Dan Taninecz Miller Over a year ago

This worked for me, and also seems to work with a series column (strings inside brackets) which is cool.

Mayeul sgc · Accepted Answer · 2017-02-28 10:17:45Z

0

I would try something like this I think

b=0
for index in df.index:
    A=df.loc[row,'FzList'].split(',')
    for element in A:
        if '7MA'in element: 
            b+=1
return b

answered Feb 28, 2017 at 10:17

Mayeul sgc

2,0993 gold badges23 silver badges38 bronze badges

Comments

Wiktor Stribiżew · Accepted Answer · 2021-12-15 12:00:40Z

You need to use Series.str.count that accepts a regex pattern as the first argument, and an optional second argument accepting regex flags that can modify the matching behavior:

import re
df_hvl['FzListe'].str.count(re.escape(substr))
## enabling case insensitive match:
df_hvl['FzListe'].str.count(re.escape(substr), re.I)

You need to use re.escape as Series.str.count will fail if a substr contains special regex metacharacters.

Related posts:

Escaping strings for use in regex: Escaping regex string

In case you need to match a whole word...

Adaptive dynamic word boundaries: Word boundary with words starting or ending with special characters gives unexpected results
Dynamic word boundaries: Match a whole word in a string using dynamic regex
Handling thousands of words to search for as whole words: Performance of using regex matched groups in pandas dataframe

Collectives™ on Stack Overflow

Python count string (word) in column of a dataframe

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related