4

I am quite new to python and I'm working on a task where I'm supposed to keep building on a regex and I have encountered a full stop.

For some reason when adding the latter parts some of the regex just breaks down and stops matching a few strings that were previously matched.

I am supposed to run the regex on a string that looks like such:

Sep 15 04:34:02 li146-252 sshd[12130]: Failed password for invalid user ronda from 212.58.111.170

The code:

#!/usr/bin/python
import re

with open('livehack.txt', 'r') as file:
    for line in file:
        dateString = re.findall('^(?:[A-z][a-z]{2}[ ][0-9]{1,2}[ ][\d]{2}[:][\d]{2}[:][\d]{2}) | li146-252 | ?:[0-9]{5} | Failed password for invalid', line)
        print dateString

The result of the code:

['Sep 17 06:40:28 ', ' Failed password for invalid']

As you can see, there is a few things that should be caught that are missing, and I have no idea why.

Thanks in advance.

1
  • If you put your regex here it catches what you expect? regex101.com Commented Feb 24, 2015 at 15:35

3 Answers 3

1

Regex expressions are always difficult to read. Try an online Regex tester. This will probably give you some more information about what is wrong and you can try different inputs and expressions. These are my favorites:

In your case I think you have added some extra space characters to the regex that should not be there. Space also counts as a character that needs to match.

I would also add parentheses around the expressions that are separated with |. Sometimes it is hard to know what parts are used when inserting a | character.

Like this:

'(?:^(?:[A-z][a-z]{2}[ ][0-9]{1,2}[ ][\d]{2}[:][\d]{2}[:][\d]{2}))|(?:li146-252)|(?:[0-9]{5})|(?:Failed password for invalid)'
Sign up to request clarification or add additional context in comments.

1 Comment

That seemed to work perfectly, such a noob mistake on my part. Thank you!
0

I think you don't want to use alterations "|" for parts of your regex, instead, you should define substrings () for all parts you want to extract from the string. What do you want to extract exactly? Other than that, avoid empty spaces and define spaces as "\s", i am not sure if [ ] is a correct substitute.

There is an quick example of what you could (i don't know what you really need) get (no optimization though):

([\D]{2,3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(li146-252)\s(sshd\[\d+\]):\s[\D\s]+((\d{1,3}\.){3}\d{1,3})

Comments

0

Your problem comes from the fact that you have extra spaces around all your |. With such syntax, 12130 from sshd[12130] will not be matched since it is surrounded by brackets, not spaces. And li146-252 is not captured because the leading space has been used to capture Sep 17 06:40:28.

So a space stripped regex should do what you want :

^(?:[A-z][a-z]{2} [0-9]{1,2} \d{2}:\d{2}:\d{2})|li146-252|[0-9]{5}|Failed password for invalid

Note: I also remove your extra brackets around single characters. Brackets are used to specify several characters (like [\d3] for any letter of 3 or [a-z] for any character between a and z) or if you want to exclude a character (like [^ ] for any character except space)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.