1

Below is the nuts and bolts of my Python file search app. I'm still a noob in Python, and have been more pleased with getting working code than considering efficiency and performance. I want to know from you Python, or any other language, veterans is there anything I can do to make my code more efficient, thereby faster? I've read somewhere about profiling a script, but I'm not really familiar with the concept, and not sure if it is applicable. Currently, my code takes about 4-5 minutes to search through 100 files (the largest file being ~5000KB). That's pretty slow.

Code:

 userstring = raw_input("Enter a search string!")
 ...
 ...
 ...
 if userstring:
        userStrHEX = userstring.encode('hex')
        userStrASCII = ''.join(str(ord(char)) for char in userstring)
        regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape(userStrHEX ), re.escape( userStrASCII )))      
 else:
    sys.exit('You Must Enter A String!!!')

    count = 0
    count2 = 0
    for afile in filelist:
        (head, filename) = os.path.split(afile)
        if afile.endswith(".log") or afile.endswith(".txt"):
            count2 += 1
            self.progress_bar.Show()
            self.progress_bar.SetRange(numFiles)
            wx.CallAfter(self.progress_bar.SetValue, count2)
            f=ftp.open(afile, 'r')
            for i, line in enumerate(f.readlines()):
                result = regex.search(line)
                if self.shouldAbort:
                    return self.shouldAbort
                    break

                if result:
                    count += 1
                    ln = str(i)
                    pathname = os.path.join(afile)
                    template = "\n\nLine: {0}\nFile: {1}\nString Type: {2}\n\n"
                    output = template.format(ln, pathname, result.group())
                    ftp.get(afile, 'c:\\Extracted\\' + filename)
                    temp.write(output)
                    break
            else:
                temp.write("\nNo Match in: " + os.path.join(afile))
0

1 Answer 1

1

This is a very reasonable solution.

It is possible to make it go faster with more regex magic but you would lose some clarity.

Keep in mind that the running time is likely dominated by the FTP file retrieval and not the search itself. So, additional optimization of an IO bound process would likely be wasted. See Amdahl's Law.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.