regex python variable

Question

I know there is a lot of questions in here about "regex python variable" but none seems to work for me. I have been looking for two hours but I did not find any answer to this question in specific.

Here is my problem: I would like to search for words of [ERROR] and [WARNING]. As you may know the /var/log/mysql/error.log has a standard file, which basically goes like this year-month-day hour:minute.

Example:

2016-01-03 13:19:40 1242 [Warning] Buffered warning: Changed limits: table_open_cache: 431 (requested 2000)

2016-01-03 13:19:40 1242 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2016-01-03 13:19:40 1242 [Note] Plugin 'FEDERATED' is disabled.

I have this script in which tries to do the job:

#!/usr/bin/python

import re
import time
import datetime
from datetime import datetime

i = datetime.now()
dia = i.day
mes_abreviado = i.strftime('%b')
hora = i.strftime('%H')
minuto = i.strftime('%M')
ano = i.strftime('%Y')
mes_ano_num = i.strftime('%m')
dia_00 = i.strftime('%d')

#Data/Hora especifica "syslog"
date = '%s  %d %s:%s'% (mes_abreviado, dia, hora, minuto)

#Data/Hora especifica do ficheiro "error.log" 
mysql_time = '%s-%s-%s %s:%s'% (ano, mes_ano_num, dia_00, hora, minuto)

print mysql_time
words = '\b\[ERROR\]\b|\b\[WARNING\]\b'
print words
file = open("/var/log/mysql/error.log", "rb")

for line in file:
        if re.findall(r'{0}'.format(words), line):
#       if re.findall(r'{0}'.format(mysql_time), line):
#               print "aqui"
                print line
file.close()

I have to get the current year, month, day, hour and minute to search for it in re.findall function. The problem is: I need to place them in a variable and use them in the regex but it doesn't seem to work.

Here's the output:

2016-01-03 14:21
\[ERROR\\[WARNING\]

As you can see words is not printing \b and it's messing up with the regex. I have tried using words = re.compile(words), words = re.compile(r'\b\[ERROR\]\b|\b\[WARNING\]\b') and re.findall(r'{0}'.format(words). From what it looks like the regex is perfectly fine.

There's a lot of comments in the code which is problems I will solve latter on. If there is something missing let me know so I can edit this answer. Thank you in advance.

It's not very clear what output/result you actually expect given your example file. Can you please elaborate? — timgeb
– timgeb, Commented Jan 3, 2016 at 14:30
I didn't read your whole code, but wild guess: try changing words = '\b\[ERROR\]\b|\b\[WARNING\]\b' to words = r'\b\[ERROR\]\b|\b\[WARNING\]\b' — Kevin
– Kevin, Commented Jan 3, 2016 at 14:35
As @Kevin says - use a raw string literal - the \bs will currently be escaped to be the backspace character and aren't being considered the regex word boundary escape character — Jon Clements
– Jon Clements, Commented Jan 3, 2016 at 14:36
@timgeb i was expecting \b\[ERROR\]\b|\b\[WARNING\]\b as the output because this is what you have to place in the regex: re.findall(r'\b\[ERROR\]\b|\b\[WARNING\]\b) right? I used Kevin's trick and the output is as expected but it still not printing out the line. Is anything wrong in the regex? — Bruno Francisco
– Bruno Francisco, Commented Jan 3, 2016 at 14:39
This is really weird. I have changed re.findall(words, line) to literally re.findall(r'\b\[ERROR\]\b|\b\[WARNING\]\b', line)and it prints out only one line containing [Warning]. Why it's not printing out the other ones? — Bruno Francisco
– Bruno Francisco, Commented Jan 3, 2016 at 14:46

DisappointedByUnaccountableMod · Accepted Answer · 2016-01-03 16:12:34Z

1

I don't know why you are using the \b in your regexp - it doesn't make sense when the word you are looking for is already delimited by [ and ]. According to the docs \b matches a zero-length string at the edges of a-zA-Z_, so your pattern could match 'a[WARNING]b'. Also I couldn't get [WARNING] in the regexp to match [Warning] in the logfile (like the sample data you provided) without ensuring case-insensitivity in the regex by adding (?i) to it.

Change the regex to: words = r'(?i)\[ERROR\]|\[WARNING\]' and it should start working.

Once you have the Error/Warning matching working, you can add the date string matching into your regexp quite easily.

edited Jan 3, 2016 at 16:12

answered Jan 3, 2016 at 15:28

DisappointedByUnaccountableMod

6,8444 gold badges21 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bruno Francisco Over a year ago

Thank you. It worked. Yesterday i read the regex documentation and i totally misunderstander the \b character. Thank you for correcting me.

Casimir et Hippolyte · Accepted Answer · 2016-01-03 15:33:14Z

1

You don't need a regex to do that, you only need to know what is the position of the field you want to check (the 4th field in your example):

lookfor = ('[Warning]', '[Error]')

with open('/var/log/mysql/error.log') as fh:
    for line in fh:
         parts = line.split(None, 5)
         if len(parts) > 3 and parts[3] in lookfor:
             print(line.rstrip())

About your code:

There are no word boundaries between a space and a square bracket since these two characters are in the same character class \W. (a word boundary is between a word character (\w) and a non-word character (\W) or the limits of the string.)

You don't need to use re.findall when you search only one occurrence in a string. re.search is better fitted for this task.

edited Jan 3, 2016 at 15:33

answered Jan 3, 2016 at 15:27

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

2 Comments

Bruno Francisco Over a year ago

Thank you on your asnwer but i would like to send every "Warnings" to an email once i get a hit on "Warning" or "Error". I think findall is better suited for this. Although you have a very good point. I may change a bit of the code to incorporate your answer.

Casimir et Hippolyte Over a year ago

@JoaoTorres: No, findall is useless because you work line by line (and working line by line is the way to go).

Collectives™ on Stack Overflow

regex python variable

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related