0

I have a folder containing multiple files. I want to count the number of files that contains matching text say "Pathology" or a pattern say "ORC|||||xxxxxxxx||||||" inside it from those files present inside the folder. I have tried following scripts :

import re, os
import glob

list_of_files = glob.glob('./*.hl7')

for fileName in list_of_files:
    fin = open( fileName, "r" )
    count = 0

for line in fin:
    if re.match("Pathology", line):
            count +=1
fin.close()

print count

This gives me result as 0. I am using python 2.6.6. and have no options of upgrading my python. Please suggest a way to do this.

5
  • 1
    Why don't you just use grep -l "Pathology\|ORC" *.hl7 ?? Commented Jul 31, 2014 at 11:25
  • 2
    Why does this question have a Perl tag? Commented Jul 31, 2014 at 11:30
  • how if you use like if 'Pathology' in line:..count +=1? Also why every time you make count=0 for each file? see stackoverflow.com/questions/11162711/… Commented Jul 31, 2014 at 11:32
  • Please look at the question and make sure the indentation of the code is correct. Commented Jul 31, 2014 at 11:54
  • Where is the perl code? Commented Jul 31, 2014 at 11:57

3 Answers 3

1

If you will accept a Perl solution then this fits the bill.

As it stands it prints the names of all the matching files. If you really want just the count then remove the line print $ARGV, "\n"

use strict;
use warnings;

local @ARGV = glob './*.hl7';

my $count;

while (<>) {
  next unless /Pathology/i;
  ++$count;
  print $ARGV, "\n";
  close ARGV;
}

print "\n\n$count files found\n";
Sign up to request clarification or add additional context in comments.

2 Comments

Can I check whether a line begins with ORC and the length of that line is atleast 10 then count otherwise not
@Debarshi: So you want to count the number of files that contain at least one record beginning ORC that has at least ten characters?
1

You can do this with grep and wc:

grep Pathology *.hl7 | wc -l

gives you the number of hits.

grep -c Pathology *.hl7

will list the files with hits and then the number of hits per file.

Comments

0

Easiest is to just use grep --files-with-matches StringOrPattern *.hl7 or grep -l StringOrPattern *.hl7 but if you need to do it in python you need to fix your indentation as your current code as posted will only report the number of matches in the last file.

import re, os
import glob

list_of_files = glob.glob('./*.hl7')
files_with_matches = 0

for fileName in list_of_files:
    fin = open( fileName, "r" )
    count = 0

    for line in fin:
        if re.match("Pathology", line):
            count +=1
    fin.close()

    if count > 0:
        files_with_matches += 1
        print filename, count

print "Done", files_with_matches, "Matches"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.