0

I know there is a lot of questions in here about "regex python variable" but none seems to work for me. I have been looking for two hours but I did not find any answer to this question in specific.

Here is my problem: I would like to search for words of [ERROR] and [WARNING]. As you may know the /var/log/mysql/error.log has a standard file, which basically goes like this year-month-day hour:minute.

Example:

2016-01-03 13:19:40 1242 [Warning] Buffered warning: Changed limits: table_open_cache: 431 (requested 2000)

2016-01-03 13:19:40 1242 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2016-01-03 13:19:40 1242 [Note] Plugin 'FEDERATED' is disabled.

I have this script in which tries to do the job:

#!/usr/bin/python

import re
import time
import datetime
from datetime import datetime

i = datetime.now()
dia = i.day
mes_abreviado = i.strftime('%b')
hora = i.strftime('%H')
minuto = i.strftime('%M')
ano = i.strftime('%Y')
mes_ano_num = i.strftime('%m')
dia_00 = i.strftime('%d')

#Data/Hora especifica "syslog"
date = '%s  %d %s:%s'% (mes_abreviado, dia, hora, minuto)

#Data/Hora especifica do ficheiro "error.log" 
mysql_time = '%s-%s-%s %s:%s'% (ano, mes_ano_num, dia_00, hora, minuto)

print mysql_time
words = '\b\[ERROR\]\b|\b\[WARNING\]\b'
print words
file = open("/var/log/mysql/error.log", "rb")

for line in file:
        if re.findall(r'{0}'.format(words), line):
#       if re.findall(r'{0}'.format(mysql_time), line):
#               print "aqui"
                print line
file.close()

I have to get the current year, month, day, hour and minute to search for it in re.findall function. The problem is: I need to place them in a variable and use them in the regex but it doesn't seem to work.

Here's the output:

2016-01-03 14:21
\[ERROR\\[WARNING\]

As you can see words is not printing \b and it's messing up with the regex. I have tried using words = re.compile(words), words = re.compile(r'\b\[ERROR\]\b|\b\[WARNING\]\b') and re.findall(r'{0}'.format(words). From what it looks like the regex is perfectly fine.

There's a lot of comments in the code which is problems I will solve latter on. If there is something missing let me know so I can edit this answer. Thank you in advance.

5
  • 1
    It's not very clear what output/result you actually expect given your example file. Can you please elaborate? Commented Jan 3, 2016 at 14:30
  • I didn't read your whole code, but wild guess: try changing words = '\b\[ERROR\]\b|\b\[WARNING\]\b' to words = r'\b\[ERROR\]\b|\b\[WARNING\]\b' Commented Jan 3, 2016 at 14:35
  • As @Kevin says - use a raw string literal - the \bs will currently be escaped to be the backspace character and aren't being considered the regex word boundary escape character Commented Jan 3, 2016 at 14:36
  • @timgeb i was expecting \b\[ERROR\]\b|\b\[WARNING\]\b as the output because this is what you have to place in the regex: re.findall(r'\b\[ERROR\]\b|\b\[WARNING\]\b) right? I used Kevin's trick and the output is as expected but it still not printing out the line. Is anything wrong in the regex? Commented Jan 3, 2016 at 14:39
  • This is really weird. I have changed re.findall(words, line) to literally re.findall(r'\b\[ERROR\]\b|\b\[WARNING\]\b', line)and it prints out only one line containing [Warning]. Why it's not printing out the other ones? Commented Jan 3, 2016 at 14:46

2 Answers 2

1

I don't know why you are using the \b in your regexp - it doesn't make sense when the word you are looking for is already delimited by [ and ]. According to the docs \b matches a zero-length string at the edges of a-zA-Z_, so your pattern could match 'a[WARNING]b'. Also I couldn't get [WARNING] in the regexp to match [Warning] in the logfile (like the sample data you provided) without ensuring case-insensitivity in the regex by adding (?i) to it.

Change the regex to: words = r'(?i)\[ERROR\]|\[WARNING\]' and it should start working.

Once you have the Error/Warning matching working, you can add the date string matching into your regexp quite easily.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. It worked. Yesterday i read the regex documentation and i totally misunderstander the \b character. Thank you for correcting me.
1

You don't need a regex to do that, you only need to know what is the position of the field you want to check (the 4th field in your example):

lookfor = ('[Warning]', '[Error]')

with open('/var/log/mysql/error.log') as fh:
    for line in fh:
         parts = line.split(None, 5)
         if len(parts) > 3 and parts[3] in lookfor:
             print(line.rstrip())

About your code:

There are no word boundaries between a space and a square bracket since these two characters are in the same character class \W. (a word boundary is between a word character (\w) and a non-word character (\W) or the limits of the string.)

You don't need to use re.findall when you search only one occurrence in a string. re.search is better fitted for this task.

2 Comments

Thank you on your asnwer but i would like to send every "Warnings" to an email once i get a hit on "Warning" or "Error". I think findall is better suited for this. Although you have a very good point. I may change a bit of the code to incorporate your answer.
@JoaoTorres: No, findall is useless because you work line by line (and working line by line is the way to go).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.