1

I have this kind of long log file.

2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851
2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254
2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740

I want to find id=1740 exactly, and print line, but id=174025851 also count in it. how can I find exactly string id=1740 in a line and print line.

for line in f: 
    if str(id) in line: 
        print(line)

it also print the first and second line but I just want 4th line only with exactly id 1740

3
  • If you mean exactly then line.strip().endsWith(“id 1740”) or something similar? (Not at a machine to test this atm.) Commented Jun 23, 2020 at 17:27
  • if you're using the re module then wouldn't re.search(r'\b1740\b',line) suffice? Commented Jun 23, 2020 at 17:28
  • 1
    Zaman Azam: I updated my answer to reflect what you wrote under pciunkiewicz's answer regarding the fact that the "id" could occur in the middle of the string as well as the end. This detail should have been stated up front in the question, because not knowing it invalidates a number of answers. Please can you edit your question now to reflect the requirements. Commented Jun 23, 2020 at 18:16

7 Answers 7

1

At risk of adding yet another answer to a question which has many already, here is how I think that a regular expression parser is best used here:

import re

the_id = 1740

with open("test.txt") as f:
    for line in f:
        match = re.search("id\s+(\d+)\s*$", line)
        if match and the_id == int(match.group(1)):
            print(line, end='')

This gives:

2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740

What you are doing here is using the parser to look for lines which end with the following: "id", followed by whitespace, followed by one or more digits (which you capture in a group), optionally followed by any amount of whitespace.

The captured group is then converted to int and compared with the id.

Incidentally, the id is stored in variable called the_id, because id is the name of a builtin function so is not a good choice of variable name (interferes with use of the builtin).


UPDATE

The asker has now clarified that the ID can appear in the middle of the line, not necessarily at the end.

This can easily be handled by a simple tweak to the regular expression. Changing the relevant line in the above code to:

        match = re.search("id\s+(\d+)", line)

now removes any check on what should come after the digits.

Because the + meaning "one or more" is also greedy (that is, it matches the part of the pattern to which it relates as many times as possible), the whole of the ID is matched by the bracketed group, without need to specify anything about what follows it.

Given the input file

2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851
2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254
2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740
2012-02-03 19:11:02 id 1740 SampleClass5 [TRACE] verbose detail

this will now output:

2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740
2012-02-03 19:11:02 id 1740 SampleClass5 [TRACE] verbose detail
Sign up to request clarification or add additional context in comments.

8 Comments

is this also wor on this line
Feb 23 07:38:19 router1 snmpd[359]: SNMPD_TRAP_COLD_START: trap_generate_cold: SNMP 1740 trap: cold start
@ZamanAzam That does not even contain the string id. How is the code supposed to know that that is an id?
So I have different log files. Which I filter using .log. Then I read first file. And every line in file contains different number strings. The number(Id) is given by user. If it’s exactly same in the line. Then it print line
snmpd[359]: SNMPD_TRAP_WARM_START:SNMP trap: warm start SNMPD_THROTTLE_QUEUE_DRAINED:() cleared all throttled traps SNMPD_TRAP_WARM_START:SNMP trap:(1740;543;544) warm start SNMPD_TRAP_COLD_START:SNMP trap:(1740737)cold start
|
0

You can use regex something like id followed by space. Or if the id is always at the end of the line. Then use If line.endswith('id '+id) is true then do your logic.

Comments

0

You can use regular expressions

import re
text = """
2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851
2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254
2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740
"""

# the \s means the char after 0 must be a space, tab or newline (so, not a number)
p = re.compile(r'.*id 1740\s') 
ls = p.findall(text)

Comments

0

Here is another possibility:

import pandas as pd

data=pd.read_csv('/path/to/data.txt', header=None)

for i in range(0,len(data)):
    if data.iloc[i,0].split(' ')[-1][:4]=='1740':
        print(data.iloc[i,0])

I use csv even if not comma-separated so that I keep lines as single strings! Then check within a loop.

Comments

0

You could do something like this and split the line and take the last value:

for line in f: 
    if '1740' in line:
        a = line.split(' ')[-1]
        if a == '1740': 
            print(line)

1 Comment

I don't think want you to hard-code the value '1740' (especially not having the same constant hard-coded in two places), but you might want to take the opportunity to suggest a better variable name for it than the id used in the question (which is the name of a builtin).
0

Based on the structure of your logs, if the id is always at the very end of the line you can always change it to look for lines ending with your exact query:

for line in f: 
    if line.endswith(f"id {id}"): 
        print(line)

Edit:

As mentioned by @mrblewog, if the line has trailing whitespace, we can pre-process it using rstrip or strip:

for line in f:
    line = line.rstrip()
    ### Rest of the logic ###

6 Comments

Potentially need to strip whitespace off the line before endswith
what if the line id to search for is 1740 but the line ends with 1991740? This wouldn't work.
@DavidErickson Ahh yeah you're right! I have updated it to be more precise, starting all the way back from id XXXXXXXX.
id can also appear in middle, its a long log file
2012-02-03 19:11:02 id 1740 SampleClass5 [TRACE] verbose detail
|
0

You could do it with regex as -

import re
file = ['2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851',
'2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254',
'2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656',
'2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740']
for line in file :
    num = re.findall(r'\d+', line)[-1]
    if(num == '1740'):
        print(line)

Output :

2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740

With this code, you will find the last number that occurs on each line even if the string is not ending with a number.


The following will check if 1740 occurs anywhere in the line.

import re
file = ['2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851',
'2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254',
'2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656',
'2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740',
'2012-02-03 19:11:02 id 1740 SampleClass5 [TRACE] verbose detail ']
for line in file :
    num = re.findall(r'\d+', line)
    if('1740' in num):
        print(line)

Or if you are sure that each line ends with a number then you can simple split the string and compare with the last element of the split as -

file = ['2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 174025851',
'2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991740254',
'2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656',
'2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740']

for line in file :
    num = line.split()[-1]
    if(num == '1740'):
        print(line)

Output :

2012-02-03 18:35:34 SampleClass3 [WARN] missing id 1740

5 Comments

id can be anywhere in file. like in middle of strin line.
Then the regex code would work for you(provided its the last number that occurs on that line)
like id can appears like this in file,
2012-02-03 19:11:02 id 1740 SampleClass5 [TRACE] verbose detail
@ZamanAzam See if the second one that I added now helps. It works for the case you provided

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.