1

I want to know what the most efficient way is to parse a text file. For example, lets say I have the following text file:

Number of connections server is: 1

Server status is: ACTIVE

Number of connections to server is: 4

Server status is: ACTIVE

Server is not responding: 13:25:03

Server connection is established: 13:27:05

What I want to do is to go through the file and gather information. For example, number of connections to the server, or the times the server went down. I want to save these values in maybe lists, so that I can view or plot them later.

So what is the best way to perform this, assuming I have my keywords in a list as follows:

referenceLines = ['connections server', 'Server status', 'not responding']

Note that I do not have the complete sentence in the list, but only a part of it. I want to go through the file, line-by-line, and check if the read line corresponds to any entry in the referenceLines list, if so, get the index of the list entry and call the corresponding function.

What would be the most efficient (time, memory) way to do this, as a typical text file will be about 50MB in size.

Thank you.

Any

5
  • 3
    more efficient than for line in open('filename.txt', 'r'): --do whatever?? Commented May 22, 2012 at 13:04
  • 3
    Just as a side note, it's a bit more pythonic to do things as simply as possible and as obviously as possible, only worrying about efficiency after you've proven (by experience or measurement) that it's inefficient Commented May 22, 2012 at 13:09
  • 1
    Is it possible to have the entire phrase before colon in referenceLines? Comparing strings for exact match should be faster than substring search. Commented May 22, 2012 at 13:25
  • @Christopher. Thank you. At this point I'm basically trying to figure what is the best way to do this. Especially since I have only a substring in referenceList,I want to know how I can compare a complete line with a list that has only a substring, not the other way. Commented May 22, 2012 at 13:52
  • @JanneKarila The lines are usually very long and they are not in the oder of message:value. So I have to compare a substring. Thank you. Commented May 22, 2012 at 13:53

4 Answers 4

4

If every line is seperated by ": ", you can split the string.

message, value = line.split(': ', 1)
Sign up to request clarification or add additional context in comments.

2 Comments

line.split(': ', 1) would be more appropriate.
Thank you mikerobi and eumiro. I have given only an example, but in the real file, the lines are very long and the message:value is usually not seperated by ':'.
1

As a practical approach, I suggest that you implement this in a series of steps while measuring the performance at each step to gauge the cost of the approach you are using with your test data.

For example:

  • How long does it take to simply read the file line by line?
  • How long if you split() each line?
  • How long if you run re.match() on each line?

The optimal solution will depend on your data, for example, how many reference lines your are using, but it should only take a few seconds on a modern machine

2 Comments

Thanks rupello. Right now I'm concerned with the best way to detect whether the line read from file (complete line) to the substrings in the referenceList.
Take a look at this question: stackoverflow.com/questions/3260962/…
1

If the text file you want to parse always contains the same fields in the same order, then mikerobi's solution is good. Otherwise, you need to iterate through the lines and try detecting referenceLines...

Comments

1

Here's one possible approach. It uses a regular expression pattern of the form 'keyword1|keyword2' to search for multiple keywords at once.

def func1(line):
    #do something

def func2(line):
    #do something

actions = {'connections server': func1,
           'Server status': func2}

regex = re.compile('|'.join(re.escape(key) for key in actions))

for line in file:
    for matchobj in regex.finditer(line):
        actions[matchobj.group()](line)

1 Comment

Thank you Janne. Regex is what I wanted to try out too but didn't know how to implement erxactly. I will try to implement this way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.