Extract specific information from a log file using a script (Linux or Windows or Python)

Question

I am interested in learning how to extract information

count occurrences of keywords,
get the timestamp for specific occurrences of keywords (note that the timestamps will always for the same day; usually, within a couple of hours of the same day),
get the elapsed time of specific log entries

from a text log file (log.txt) via a script (Linux bash or Windows batch or Python). All this information should be written in another text file (results.txt) or be printed on the terminal.

Basically, all the other log entries (i.e. with the blah blah are ignored).

For example for the following text log file, where each line starts with a timestamp followed by an empty space, a dash line(-) and one or more empty space(s) followed by keywords:

11:59:35.875 - action - WRITE(34) start

11:59:35.875 - blah blah

11:59:35.875 - blah blah

11:59:35.877 - blah blah

11:59:35.897 - KEYWORD_1

11:59:35.975 - action - WRITE(34) end

11:59:36.992 - KEYWORD_1

11:59:36.999 - KEYWORD_1

11:59:37.535 - blah blah

11:59:37.545 - ACTION_A - STATE: type 2

11:59:37.575 - blah blah

11:59:37.577 - blah blah

11:59:37.845 - KEYWORD_2

11:59:37.945 - ACTION_B result

11:59:37.950 - blah blah

11:59:38.075 - action - WRITE(22) start

11:59:38.075 - blah blah

11:59:38.085 - blah blah

11:59:38.097 - KEYWORD_2

11:59:39.975 - action - WRITE(22) end

Firstly, I would like to count the occurrences of each of the KEYWORD_1 and the KEYWORD_2 (e.g. 2 and 2, respectively).

Secondly, I want to be able to print the timestamps of each KEYWORD occurrence, e.g. 11:59:35.897 for the first occurrence of the KEYWORD_1.

Finally, find the elapsed time between two log entries:

those who start with - action - WRITE(#) start and end with - action - WRITE(#) end where # is any integer number, e.g. 11:59:35.975 - 11:59:35.875 = 1ms for the first WRITE(34)
those who start with - ACTION_A ... and end with ACTION_B ... e.g. 11:59:37.545 - 11:59:37.945 = 4ms for the first ACTION_A .. ACTION_B.

I have tried find /c "KEYWORD_1" log.txt >results.txt (Windows batch) to count the occurrences but I cannot extract the respective timestamp. For the other requirements I have no idea how to start as I have no experience with such actions before. Tried adapting the answers from this question for my needs with no success.

Any code fragment example or link with related resources will be much appreciated.

Interesting question, you should consider searching for parsing text files in a batch script. — user6167676
– user6167676, Commented Nov 5, 2016 at 19:03
Also, using Linux bash should be easier (?), although I am not skilled enough to give a solution. — user6167676
– user6167676, Commented Nov 5, 2016 at 19:09
Instead of spending time trying to fulfill your complex requirements with batch I'd try learning powershell or maybe LogParser. Aside from this with only timestamps it might be difficult to get elapsed tme when overlapping a day. — user6811411
– user6811411, Commented Nov 5, 2016 at 19:30
I understand the challenge, so I edited the question to include more possible solution scripts; windows batch, linux bash or Python. — Yannis
– Yannis, Commented Nov 5, 2016 at 19:59
@LotPings I should clarify that the timestamps will always for the same day (usually, within a couple of hours of the same day). — Yannis
– Yannis, Commented Nov 5, 2016 at 20:09

Magoo · Accepted Answer · 2016-11-06 01:48:16Z

@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q40441783.txt"
:: occurrence count for KEYWORD_1,KEYWORD_2,KEYWORD_3
FOR %%k IN (KEYWORD_1 KEYWORD_2 KEYWORD_3) DO (
 FOR /f "delims=" %%c IN ('type "%filename1%"^|FIND /c "%%k"') DO ECHO %%k : %%c times
)
:: Timestamp display for KEYWORD_1,KEYWORD_2,KEYWORD_3
FOR %%k IN (KEYWORD_1 KEYWORD_2 KEYWORD_3) DO (
 type "%filename1%"|FIND "%%k"
)
:: remove variables starting $ or #
For %%b IN ($ #) DO FOR  /F "delims==" %%a In ('set %%b 2^>Nul') DO SET "%%a="
:: action - WRITE(#)
FOR /f "usebackqtokens=1*delims=- " %%a IN ("%filename1%") DO (
 REM interested in "pattern - WRITE(#) start/end"
 FOR /f "tokens=1-3*delims=-()" %%A IN ("%%b") DO (
  REM %%A is action, %%B "WRITE" %%C # %%D " start"/" end"
  IF "%%B"==" WRITE" IF "%%D"==" start" SET "$%%C %%A$=%%a"
  IF "%%B"==" WRITE" IF "%%D"==" end" SET "#%%C %%A#=%%a"
  CALL :formatch&CALL :report "WRITE(%%C)"
 )
)
SET $ 2>nul
SET # 2>nul
:: remove variables starting $ or #
For %%b IN ($ #) DO FOR  /F "delims==" %%a In ('set %%b 2^>Nul') DO SET "%%a="

FOR /f "usebackqtokens=1*delims=- " %%a IN ("%filename1%") DO (
 REM interested in "ACTION_A/ACTION_B elapsed time"
 FOR /f "tokens=1*delims=- " %%A IN ("%%b") DO (
  REM %%A is action, %%B remainder of line
  IF "%%A"=="ACTION_A" SET "$1$=%%a"&SET "_actiona=%%B"
  IF "%%A"=="ACTION_B" SET "#1#=%%a"
  CALL :formatch
  CALL :report "%%_actiona%% %%B"
 )
)
SET $ 2>nul
SET # 2>nul

GOTO :EOF


:: See whether a $something$ AND a #something# exist and report if so
:formatch
SET "elapsed="
FOR /f "tokens=1,2delims=$=" %%m IN ('set $ 2^>nul') DO (
 IF DEFINED #%%m# (
  CALL :elapsed %%n %%#%%m#%%
  SET "#%%m#="
  SET "$%%m$="
 )
)
GOTO :eof

:report
IF DEFINED elapsed ECHO %elapsed% %~1
GOTO :eof

:: get %2 - %1 both in HH:mm.ss.ttt format
:elapsed
FOR /f "tokens=1-4delims=:." %%w IN ("%2") DO (SET /a hh=2%%w&SET /a mm=2%%x&SET /a ss=2%%y&SET /a ttt=2%%z)
FOR /f "tokens=1-4delims=:." %%w IN ("%1") DO (SET /a hh-=1%%w&SET /a mm-=1%%x&SET /a ss-=1%%y&SET /a ttt-=1%%z)
:: compensate for "negatives"
IF %ttt% lss 1000 set/a ttt+=1000&set/a ss-=1
IF %ss% lss 100 set/a ss+=60&set/a mm-=1
IF %mm% lss 100 set/a mm+=60&set/a hh-=1
IF %hh% lss 100 set/a hh+=24
SET "elapsed=%hh:~-2%:%mm:~-2%:%ss:~-2%.%ttt:~-3%"
GOTO :eof

GOTO :eof

You would need to change the setting of sourcedir to suit your circumstances.
I used a file named q40441783.txt containing your data for my testing.

Interesting exercise.

The first two steps are fairly obvious. I included KEYWORD_3 to ensure a correct report was produced for "not found". Note that you state 2 occurrences for KEYWORD_1. Actually, in your posted data there are 3.

The next step required some explanation. First thing I do is to ensure that there are no variables starting # or $.

Next- analyse each line, splitting first on the first - or Space and then processing the part beyond the first delimiter-sequence tokenising on -() so that the tokens are as described in the REM statement. We then set the variable $...$ or #...# to the time in %%a. ... here is the unique part of the log entry - the number and action. Then check whether there is both a $...$ and #...# for the same ... and if so, clear the $#...$# variables, calculate the elapsed time, reconstruct the line and report.

The elapsed-time calculation prepends 2 to the start of each of the variables to ensure that they don't start 0 and potentially get treated as octal. Then pull the same trick with subtracting the start time, using a prepended 1 to produce a result that should be 3 digits (4 for ms). If fewer digits are detected, then we need to add an appropriate number and deduct one from the next-higher time element.

The processing for the action_a/_b timing is essentially the same, but records the start/end times in $!$/#1# since there is no indication of the nature of the strings action_a and action_b and we're forced to assume that the appropriate events don't overlap.

score 1 · Accepted Answer · 2016-11-05 21:31:20Z

Well, as it looks it isn't such a difficult task, I took it as a challenge. I use internal for /f and findstr.exe for parsing. (Done:The sub :Procline needs some more work for time calculation.)

Keywords.txt:

action
KEYWORD_1
ACTION_A
KEYWORD_2
ACTION_B

Batchfile

@echo off & setlocal EnableExtensions EnableDelayedExpansion
Set Log=LogFile.txt
Set Key=Keywords.txt
For /f "tokens=1-5* delims=-() " %%A in (
  'findstr /i /G:%Key% %Log% ^|Sort /+16'
) Do call :ProcLine "%%A" %%B %%C %%D %%E
Echo Count
set _
Pause 
Goto :Eof

:ProcLine
Set /A _%2+=1
Echo %~1 - %2 %3 %4 %5
:: Do we need to calc times? 
Echo %5 | findstr /i "end start" >Nul 2>&1 || Goto :Eof
Call :CalcMsecs %1 x%5
If /i %5 neq start Goto :Eof
Set /a xElap=xend-xstart
If %xElap% gtr 0 Echo Elapsed Time: %xElap% msec
Goto :Eof

:CalcMsecs
Setlocal&Set T=%~1
Set /A _h=%T:~0,2%,_m=%T:~3,2%,_s=%T:~6,2%,_ms=%T:~9,3%
Set /A _m+=_h*60
Set /A _s+=_m*60
Set /A _ms+=_s*1000
:: set _
Endlocal&Set %2=%_ms%&Goto :Eof

Output (till now, still refining)

11:59:39.975 - action WRITE 22 end
11:59:38.075 - action WRITE 22 start
Elapsed Time: 1914 msec
11:59:35.975 - action WRITE 34 end
11:59:35.875 - action WRITE 34 start
Elapsed Time: 100 msec
11:59:37.545 - ACTION_A STATE: type 2
11:59:37.945 - ACTION_B result
11:59:36.992 - KEYWORD_1
11:59:35.897 - KEYWORD_1
11:59:36.999 - KEYWORD_1
11:59:38.097 - KEYWORD_2
11:59:37.845 - KEYWORD_2
Count
_action=6
_ACTION_A=1
_ACTION_B=1
_KEYWORD_1=3
_KEYWORD_2=2

There is a flaw with 14 msec comming from nowhere, but it's too late for me

Nikita · Accepted Answer · 2016-11-06 18:59:15Z

Here is python option:

from datetime import datetime, timedelta

keywords = ['KEYWORD_1', 'KEYWORD_2']
actions = ['WRITE(']
kws_dict = {}
acts_dict = {}

with open('sample.log') as l:
    with open('filtered.log', 'w') as f:
        for line in l:
            if any(k in line for k in keywords) and len(line) > 1:
                entry = line.split('-')[1].strip()
                dt = line.split('-')[0].strip()
                if entry in kws_dict:
                    kws_dict[entry].append(datetime.strptime(dt, '%H:%M:%S.%f'))
                else:
                    kws_dict[entry] = [datetime.strptime(dt,'%H:%M:%S.%f')]
            elif any(a in line for a in actions) and len(line) > 1:
                dt = line.split('-')[0].strip()
                action = line.split(' ')[-2].strip()
                action_type = line.split(' ')[-1].strip()
                if action in acts_dict.keys():
                    acts_dict[action][action_type] = datetime.strptime(dt,'%H:%M:%S.%f')
                else:
                    acts_dict[action] = {action_type:datetime.strptime(dt,'%H:%M:%S.%f')}

# print results
for k in kws_dict.keys():
    print('%s occured %s times' % (k, len(kws_dict[k])))
# comment below if you don't need specific timestamps
    for i in kws_dict[k]:
        print('\t%s' % i.strftime('%H:%M:%S.%f'))

#iterate over actions, then action_types (start, end)
for a, at in acts_dict.iteritems():
    print('%s: Elapsed time %s' % (a, at['end'] - at['start']))
#    for at, t in acts_dict[a].iteritems():
#        print('\t%s: %s' % (at, t.strftime('%H:%M:%S.%f')))

## Raw data in case you need it
#print("Actions:\n", acts_dict)

Which produces:

KEYWORD_1 occured 3 times
11:59:35.897000
11:59:36.992000
11:59:36.999000
KEYWORD_2 occured 2 times
11:59:37.845000
11:59:38.097000
WRITE(34): Elapsed time 0:00:00.100000
WRITE(22): Elapsed time 0:00:01.900000

user7111497 · Accepted Answer · 2016-11-05 20:08:34Z

0

I don't know how good this will be but why don't you try to read every line and if the keyword is in it , just write it on another file . You can try to do something like:

with open('log.txt','r') as f:
    for line in f:
        if 'KEYWORD' in line:
            with open('results.txt','a') as z:
                z.write(line)

answered Nov 5, 2016 at 20:08

user7111497

3 Comments

Yannis Over a year ago

I like this approach. Could you also include some ideas on how to proceed to extract the timestamps for each match? Should I use regular expressions?

user7111497 Over a year ago

Well , if you know that the timestamps will always be 12 characters long you can do something with line[:13]--->This will "take" the first 12 characters and you can paste them somewhere.

Yannis Over a year ago

I see. But then, how will I also handle the elapsed time for the 2 possible matching log entries (start and end)? If you could put together a rough script to show both that would be great!

Collectives™ on Stack Overflow

Extract specific information from a log file using a script (Linux or Windows or Python)

4 Answers 4

Comments

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related