5

I have a string like 'apples'. I want to find this string, and I know that it exists in one out of hundreds of files. e.g.

file1
file2
file3
file4
file5
file6
...
file200

All of these files are in the same directory. What is the best way to find which file contains this string using python, knowing that exactly one file contains it.

I have come up with this:

for file in os.listdir(directory):
    f = open(file)
    for line in f:
        if 'apple' in f:
            print "FOUND"
    f.close()

and this:

grep = subprocess.Popen(['grep','-m1','apple',directory+'/file*'],stdout=subprocess.PIPE)
found = grep.communicate()[0]
print found
1
  • Are all of these files in the same directory? Commented Jun 22, 2012 at 19:16

5 Answers 5

11

Given that the files are all in the same directory, we just get a current directory listing.

import os

for fname in os.listdir('.'):    # change directory as needed
    if os.path.isfile(fname):    # make sure it's a file, not a directory entry
        with open(fname) as f:   # open file
            for line in f:       # process line by line
                if 'apples' in line:    # search for string
                    print 'found string in file %s' %fname
                    break

This automatically gets the current directory listing, and checks to make sure that any given entry is a file (not a directory).

It then opens the file and reads it line by line (to avoid problems with memory it doesn't read it in all at once) and looks for the target string in each line.

When it finds the target string it prints the name of the file.

Also, since the files are opened using with they are also automatically closed when we are done (or an exception occurs).

Sign up to request clarification or add additional context in comments.

7 Comments

os.listdir('.') returns both files and folder.
My solution is very similar except that I close the file manually. And I am absolutely positive that nothing else in that folder is every going to be there except those files, since they are generated by another program. Are you saying this is the fastest way?
@AshwiniChaudhary Yes, that's true
@Dan .. so you won't ever have directories to worry about? I think in terms of writing this in Python only, I can't think of a faster way. Whether creating a subprocess to spawn grep will be faster will probably depend on how many files you are searching. The overhead for this will diminish in relation to the number of files. To know for sure would be to time it.
@Dan One difference is that your code won't close files in case of an exception. As you aren't worried about directories, that difference is not significant in your case, but that check ordinarily would be a good idea to have.
|
2

For simplicity, this assumes your files are in the current directory:

def whichFile(query):
    for root,dirs,files in os.walk('.'):
        for file in files:
            with open(file) as f:
                if query in f.read():
                    return file

Comments

2
for x in  os.listdir(path):
    with open(x) as f:
        if 'Apple' in f.read():
         #your work
        break

Comments

0

a lazy-evaluation, itertools-based approach

import os
from itertools import repeat, izip, chain

gen = (file for file in os.listdir("."))
gen = (file for file in gen if os.path.isfile(file) and os.access(file, os.R_OK))
gen = (izip(repeat(file), open(file)) for file in gen)
gen = chain.from_iterable(gen)
gen = (file for file, line in gen if "apple" in line)
gen = set(gen)
for file in gen:
  print file

Comments

0

Open your terminal and write this:

  • Case insensitive search
grep -i 'apple' /path/to/files
  • Recursive search (through all sub folders)
grep -r 'apple' /path/to/files

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.