extract the log in shell script

Question

HI I am new to shell script. I have a log file like this.

2018-01-18T15:55:15,637 INFO  [HiveServer2-Handler-Pool: Thread-37([])]: 
thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(317)) - Client 
protocol version: HIVE_CLI_SERVICE_PROTOCOL_V7
2018-01-18T15:55:15,648 INFO  [HiveServer2-Handler-Pool: Thread-37([])]: 
session.SessionState (SessionState.java:createPath(749)) - Created HDFS

I tried to filtered out the required columns like this.

cat hive-server2.log | grep 's3\|user\|query'

2018-01-18T16:20:39,464 WARN  [67272380-f3e9-40da-8e8e-a209c05eb4fe HiveServer2-Handler-Pool: Thread-37([])]: util.CurrentUserGroupInformation (CurrentUserGroupInformation.java:getGroupNameFromUser(52)) - user aa (auth:PROXY) via hive (auth:SIMPLE) has no primary groupName, setting groupName to be aa.
2018-01-18T16:23:25,389 INFO  [HiveServer2-Background-Pool: Thread-63([])]: ql.Driver (Driver.java:execute(1735)) - Executing command(queryId=hive_20180118162325_5ad8be3f-80e7-468d-bb47-1bdc2d2fb624):      2018-01-18T16:23:25,393 INFO  [HiveServer2-Background-Pool: Thread-63([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing command(queryId=hive_20180118162325_5ad8be3f-80e7-468d-bb47-1bdc2d2fb624); Time taken: 0.004 seconds
 2018-01-18T16:23:25,972 INFO  [67272380-f3e9-40da-8e8e-a209c05eb4fe HiveServer2-Handler-Pool: Thread-49([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1210)) - Opening 's3://f4340808-220a-424c-ba67-3f2383ea42ea-c000.csv' for reading

the above prints all the filtered keyword. Now I need to save like this.

IN .txt file

column names - TimeStamp,User,Query,file path
2018-01-18T16:23:25,972,select * from bv limit 5,Opening 
's3://aaaa' for reading

I don't know how to extract this column values in the above output. Any help will be appreciated.

1) your input log fragment does not contain neither user nor query keywords - post a testable fragment; 2) post the final expected result — RomanPerekhrest
– RomanPerekhrest, Commented Jan 19, 2018 at 12:07
elaborate what concrete substring is responsible for S3 bucket path field? — RomanPerekhrest
– RomanPerekhrest, Commented Jan 19, 2018 at 12:28

RomanPerekhrest · Accepted Answer · 2018-01-19 14:00:42Z

1

Awk solution for static log format:

awk 'BEGIN{ print "TimeStamp,User,Query,S3 bucket path" }
     /\<user\>/{ u=$10 }
     /Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
     /s3:\/\//{ print $1,u,q,$10 }' OFS=',' hive-server2.log

The output:

TimeStamp,User,Query,S3 bucket path
2018-01-18T16:23:25,972,a8197zz,select * from pfeevent limit 5,'s3://3m-his-dev-cayuga/Demo-Enterprise/TrustedZone/Published/Enrichment/Core/PFE/PFEEvent/part-00000-f4340808-220a-424c-ba67-3f2383ea42ea-c000.snappy.parquet'

answered Jan 19, 2018 at 14:00

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

RomanPerekhrest Over a year ago

@arunabimaniyu, that's why I wrote "for static log format". When you will describe and elaborate all the crucial lines format of your log file and make your conditions more clear - you'll increase your chances

arun abimaniyu Over a year ago

TimeStamp,User,Query,S3 bucket path ,'s3://3m-his-dev-cayuga/Demo-Enterprise/TrustedZone/Published/Enrichment/Core/PFE/PFEEvent/part-00000-f4340808-220a-424c-ba67-3f2383ea42ea-c000.snappy.parquet'

arun abimaniyu Over a year ago

whatever output u got.. that one also I am not getting..Whether I am making any mistake??

arun abimaniyu Over a year ago

Humm any idea to how I get the required output ?

RomanPerekhrest Over a year ago

@shellter, thanks. tech support is always much harder than givings answers. )

|

Collectives™ on Stack Overflow

extract the log in shell script

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related