How efficient is my Python search code

Question

Below is the nuts and bolts of my Python file search app. I'm still a noob in Python, and have been more pleased with getting working code than considering efficiency and performance. I want to know from you Python, or any other language, veterans is there anything I can do to make my code more efficient, thereby faster? I've read somewhere about profiling a script, but I'm not really familiar with the concept, and not sure if it is applicable. Currently, my code takes about 4-5 minutes to search through 100 files (the largest file being ~5000KB). That's pretty slow.

Code:

 userstring = raw_input("Enter a search string!")
 ...
 ...
 ...
 if userstring:
        userStrHEX = userstring.encode('hex')
        userStrASCII = ''.join(str(ord(char)) for char in userstring)
        regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape(userStrHEX ), re.escape( userStrASCII )))      
 else:
    sys.exit('You Must Enter A String!!!')

    count = 0
    count2 = 0
    for afile in filelist:
        (head, filename) = os.path.split(afile)
        if afile.endswith(".log") or afile.endswith(".txt"):
            count2 += 1
            self.progress_bar.Show()
            self.progress_bar.SetRange(numFiles)
            wx.CallAfter(self.progress_bar.SetValue, count2)
            f=ftp.open(afile, 'r')
            for i, line in enumerate(f.readlines()):
                result = regex.search(line)
                if self.shouldAbort:
                    return self.shouldAbort
                    break

                if result:
                    count += 1
                    ln = str(i)
                    pathname = os.path.join(afile)
                    template = "\n\nLine: {0}\nFile: {1}\nString Type: {2}\n\n"
                    output = template.format(ln, pathname, result.group())
                    ftp.get(afile, 'c:\\Extracted\\' + filename)
                    temp.write(output)
                    break
            else:
                temp.write("\nNo Match in: " + os.path.join(afile))

Raymond Hettinger · Accepted Answer · 2011-11-16 20:13:03Z

1

This is a very reasonable solution.

It is possible to make it go faster with more regex magic but you would lose some clarity.

Keep in mind that the running time is likely dominated by the FTP file retrieval and not the search itself. So, additional optimization of an IO bound process would likely be wasted. See Amdahl's Law.

answered Nov 16, 2011 at 20:13

Raymond Hettinger

229k67 gold badges405 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How efficient is my Python search code

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related