1

So, I recently got into learning python and at work we wanted some way to make the process of finding specific keywords in our log files easier, to make it easier to tell what IPs to add to our block list.

I decided to go about writing a python script that would take in a logfile, take in a file with a list of key terms, and then look for those key terms in the log file and then write the lines that matched the session IDs where that key term was found; to a new file.

import sys
import time
import linecache
from datetime import datetime

def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'):
    return datetime.now().strftime(fmt).format(fname=fname)

importFile = open('rawLog.txt', 'r') #pulling in log file
importFile2 = open('keyWords.txt', 'r') #pulling in keywords
exportFile = open(timeStamped('ParsedLog.txt'), 'w') #writing the parsed log

FILE = importFile.readlines()
keyFILE = importFile2.readlines()

logLine = 1  #for debugging purposes when testing
parseString = '' 
holderString = ''
sessionID = []
keyWords= []
j = 0

for line in keyFILE: #go through each line in the keyFile 
        keyWords = line.split(',') #add each word to the array

print(keyWords)#for debugging purposes when testing, this DOES give all the correct results


for line in FILE:
        if keyWords[j] in line:
                parseString = line[29:35] #pulling in session ID
                sessionID.append(parseString) #saving session IDs to a list
        elif importFile == '' and j < len(keyWords):  #if importFile is at end of file and we are not at the end of the array
                importFile.seek(0) #goes back to the start of the file
                j+=1        #advance the keyWords array

        logLine +=1 #for debugging purposes when testing
importFile2.close()              
print(sessionID) #for debugging purposes when testing



importFile.seek(0) #goes back to the start of the file


i = 0
for line in FILE:
        if sessionID[i] in line[29:35]: #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
                holderString = line #pulling the line of log file
                exportFile.write(holderString)#writing the log file line to a new text file
                print(holderString) #for debugging purposes when testing
                if i < len(sessionID):
                    i+=1

importFile.close()
exportFile.close()

It is not iterating across my keyWords list, I probably made some stupid rookie mistake but I am not experienced enough to realize what I messed up. When I check the output it is only searching for the first item in the keyWords list in the rawLog.txt file.

The third loop does return the results that appear based on the sessionIDs that the second list pulls and does attempt to iterate (this gives an out of bounds exception due to i never being less than the length of the sessionID list, due to sessionID only having 1 value).

The program does write to and name the new logfile sucessfully, with a DateTime followed by ParsedLog.txt.

3
  • This is j+=1 #advance the keyWords array in else. It wont executed if this is true if keyWords[j] in line:. Maybe this is the reason? Commented Mar 3, 2015 at 23:19
  • What's the intended behavior here? For each line, look for at least one occurrance of a keyWord, and save the sessionID? Or for each line and for each keyWord, if the keyWord is found in the line, then save the sessionID (i.e. you can have the same line be saved multiple times?) Commented Mar 3, 2015 at 23:33
  • Your first assertion, "For each line, look for at least one occurrance of a keyWord, and save the sessionID?" is correct. Basically, looking for odd log in behavior and then saving the sessionID where that odd log in behavior occurred. Commented Mar 4, 2015 at 0:02

2 Answers 2

2

It looks to me like your second loop needs an inner loop instead of an inner if statement. E.g.

for line in FILE:
    for word in keyWords:
            if word in line:
                    parseString = line[29:35] #pulling in session ID
                    sessionID.append(parseString) #saving session IDs to a list
                    break # Assuming there will only be one keyword per line, else remove this
    logLine +=1 #for debugging purposes when testing
importFile2.close()      
print(sessionID) #for debugging purposes when testing        

Assuming I have understood correctly, that is.

Sign up to request clarification or add additional context in comments.

2 Comments

I think you mean if word in line not if keyWords[j] in line
Thank you very much, I see that I was forming my loops incorrectly now.
2

If the elif is never True you never increase j so you either need to increment always or check that the elif statement is actually ever evaluating to True

   for line in FILE:
        if keyWords[j] in line:
                parseString = line[29:35] #pulling in session ID
                sessionID.append(parseString) #saving session IDs to a list
        elif importFile == '' and j < len(keyWords):  #if importFile is at end of file and we are not at the end of the array
                importFile.seek(0) #goes back to the start of the file
        j+=1     # always increase

Looking at the above loop, you create the file object with importFile = open('rawLog.txt', 'r') earlier in your code so comparing elif importFile == '' will never be True as importFile is a file object not a string.

You assign FILE = importFile.readlines() so that does exhaust the iterator creating the FILE list, you importFile.seek(0) but don't actually use the file object anywhere again.

So basically you loop one time over FILE, j never increases and your code then moves to the next block.

What you actually need are nested loops, using any to see if any word from keyWords is in each line and forget about your elif :

for line in FILE: 
    if any(word in line for word in keyWords):
            parseString = line[29:35] #pulling in session ID
            sessionID.append(parseString) #saving session IDs to a list

The same logic applies to your next loop:

for line in FILE:
    if any(sess in line[29:35] for sess in sessionID ): #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
            exportFile.write(line)#writing the log file line to a new text file

holderString = line does nothing bar refer to the same object line so you can simply exportFile.write(line) and forget the assignment.

On a sidenote use lowercase and underscores for variables etc.. holderString -> holder_string and using with to open your files would be best as it also closes them for.

with open('rawLog.txt') as import_file:
    log_lines = import_file.readlines()

I also changed FILE to log_lines, using more descriptive names makes your code easier to follow.

2 Comments

Thank you for the explanation of what I messed up!
@AddisonWilson, no worries, using any is the pythonic way to do what you want, it does what your accepted answer does just in a nicer way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.