2

I am using python to search through a text log file line by line and I want to save a certain part of a line as a variable. I am using Regex but don't think I am using it correctly as I am always get None for my variable string_I_want. I was looking at other Regex questions on here and saw people adding .group() to the end of their re.search but that gives me an error. I am not the most familiar with Regex but can't figure out where am I going wrong?

Sample log file:

2016-03-08 11:23:25  test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165

My script:

def get_data(log_file):

    #Read file line by line
    with open(log_file) as f:
        f = f.readlines()

        for line in f:
            date = line[0:10]
            time = line[11:19]

            string_I_want=re.search(r'/m=\w*/g',line)

            print date, time, string_I_want
2
  • regex is wrong..you are using Javascript format of regex Commented May 16, 2016 at 10:20
  • 1
    Don't just guess what those re functions and methods do --- read the "Regular Expression HOWTO" for a thorough introduction to using regular expressions in Python 2, and refer to the re reference docs when you need to look up specifics. It will save you time in the long run. Commented May 16, 2016 at 11:29

3 Answers 3

2

You need to remove the /.../ delimiters with the global flag, and use a capturing group:

mObj = re.search(r'm=(\w+)',line)
if mObj:
    string_I_want = mObj.group(1)

See this regex demo and the Python demo:

import re
p = r'm=(\w+)'              # Init the regex with a raw string literal (so, no need to use \\w, just \w is enough)
s = "2016-03-08 11:23:25  test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165"
mObj = re.search(p, s)      # Execute a regex-based search
if mObj:                    # Check if we got a match
    print(mObj.group(1))    # DEMO: Print the Group 1 value

Pattern details:

  • m= - matches m= literal character sequence (add a space before or \b if a whole word must be matched)
  • (\w+) - Group 1 capturing 1+ alphanumeric or underscore characters. We can reference this value with the .group(1) method.
Sign up to request clarification or add additional context in comments.

Comments

0

Do:

(?<=\sm=)\S+

Example:

In [135]: s = '2016-03-08 11:23:25  test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165'

In [136]: re.search(r'(?<=\sm=)\S+', s).group()
Out[136]: 'string_I_want'

Comments

0

Here is what you need:

import re
def get_data(logfile):
    f = open(logfile,"r")
    for line in f.readlines():
        s_i_w = re.search( r'(?<=\sm=)\S+', line).group()
        if s_i_w:
            print s_i_w
    f.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.