Regex Pattern Matching -a substring in words in CSV File

Question

'Neighborhood,eattend10,eattend11,eattend12,eattend13,mattend10,mattend11,mattend12,mattend13,
hsattend10,hsattend11,hsattend12,hsattend13,eenrol11,eenrol12,eenrol13,menrol11,menrol12,
menrol13,hsenrol11,hsenrol12,hsenrol13,aastud10,aastud11,aastud12,aastud13,wstud10,wstud11,
wstud12,wstud13,hstud10,hstud11,hstud12,hstud13,abse10,abse11,abse12,abse13,absmd10,absmd11,
absmd12,absmd13,abshs10,abshs11,abshs12,abshs13,susp10,susp11,susp12,susp13,farms10,farms11,
farms12,farms13,sped10,sped11,sped12,sped13,ready11,ready12,ready13,math310,math311,math312,
math313,read310,read311,read312,read313,math510,math511,math512,math513,read510,read511,read512,
read513,math810,math811,math812,math813,read810,read811,read812,read813,hsaeng10,hsaeng11,
hsaeng12,hsaeng13,hsabio10,hsabio11,hsabio12,hsabio13,hsagov10,hsagov11,hsagov13,hsaalg10,
hsaalg11,hsaalg12,hsaalg13,drop10,drop11,drop12,drop13,compl10,compl11,compl12,compl13,
sclsw11,sclsw12,sclsw13,sclemp13\

I have this data set. I need to know how many drop words are there and print them.

Or similarly for any word like mattend and print those.

I tried using findall but I think that's not correct

I assume we can use re.search or re.match. How can I do it in RegEx?

avarice · Accepted Answer · 2020-11-08 02:33:10Z

1

You can use len() on re.findall() to get the length of the returned list:

import re
with open('example.csv') as f:
  data = f.read().strip()
print(len(re.findall('drop',data)))

answered Nov 8, 2020 at 2:33

avarice

15.6k3 gold badges19 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kirti purohit Over a year ago

Thanks but it displays only the substring matched. I want to print the strings as well, drop13,drop12,drop23 as such and here's how I tried to do it from the above answer , re.findall("drop\d*", str) but I'm getting the error expected string or bytes-like object . Would you mind rectifying it ?

avarice Over a year ago

@kirtipurohit if then use a raw string re.findall(r'drop\d*', str) and please avoid str as variable name

Arzybek · Accepted Answer · 2020-11-08 02:41:11Z

1

I think re.findall should be correct. From python re module documentation:

Search:

Scan through string looking for the first location where this regular expression produces a match, and return a corresponding match object.

Match:

If zero or more characters at the beginning of string match this regular expression, return a corresponding match object.

Findall:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

I tried it on your example and it worked for me: re.findall("drop", str)

If you want to see digits after it you can try something like: re.findall("drop\d*", str)

If you want to count the words you can use: len(re.findall("drop\d*", str))

answered Nov 8, 2020 at 2:41

Arzybek

8624 gold badges14 silver badges41 bronze badges

2 Comments

kirti purohit Over a year ago

I am getting an error : expected string or bytes-like object

kirti purohit Over a year ago

with open('exg.csv','r') as file: data=file.read().split(',') I converted all those csv datatypes into string types but the error still persists

Collectives™ on Stack Overflow

Regex Pattern Matching -a substring in words in CSV File

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related