0

HI I am new to shell script. I have a log file like this.

2018-01-18T15:55:15,637 INFO  [HiveServer2-Handler-Pool: Thread-37([])]: 
thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(317)) - Client 
protocol version: HIVE_CLI_SERVICE_PROTOCOL_V7
2018-01-18T15:55:15,648 INFO  [HiveServer2-Handler-Pool: Thread-37([])]: 
session.SessionState (SessionState.java:createPath(749)) - Created HDFS 

I tried to filtered out the required columns like this.

cat hive-server2.log | grep 's3\|user\|query'

2018-01-18T16:20:39,464 WARN  [67272380-f3e9-40da-8e8e-a209c05eb4fe HiveServer2-Handler-Pool: Thread-37([])]: util.CurrentUserGroupInformation (CurrentUserGroupInformation.java:getGroupNameFromUser(52)) - user aa (auth:PROXY) via hive (auth:SIMPLE) has no primary groupName, setting groupName to be aa.
2018-01-18T16:23:25,389 INFO  [HiveServer2-Background-Pool: Thread-63([])]: ql.Driver (Driver.java:execute(1735)) - Executing command(queryId=hive_20180118162325_5ad8be3f-80e7-468d-bb47-1bdc2d2fb624):      2018-01-18T16:23:25,393 INFO  [HiveServer2-Background-Pool: Thread-63([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing command(queryId=hive_20180118162325_5ad8be3f-80e7-468d-bb47-1bdc2d2fb624); Time taken: 0.004 seconds
 2018-01-18T16:23:25,972 INFO  [67272380-f3e9-40da-8e8e-a209c05eb4fe HiveServer2-Handler-Pool: Thread-49([])]: s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1210)) - Opening 's3://f4340808-220a-424c-ba67-3f2383ea42ea-c000.csv' for reading

the above prints all the filtered keyword. Now I need to save like this.

IN .txt file

column names - TimeStamp,User,Query,file path
2018-01-18T16:23:25,972,select * from bv limit 5,Opening 
's3://aaaa' for reading

I don't know how to extract this column values in the above output. Any help will be appreciated.

4
  • 1
    1) your input log fragment does not contain neither user nor query keywords - post a testable fragment; 2) post the final expected result Commented Jan 19, 2018 at 12:07
  • I have updated the query.Kindly check it Commented Jan 19, 2018 at 12:19
  • elaborate what concrete substring is responsible for S3 bucket path field? Commented Jan 19, 2018 at 12:28
  • just s3 keyword. Commented Jan 19, 2018 at 13:04

1 Answer 1

1

Awk solution for static log format:

awk 'BEGIN{ print "TimeStamp,User,Query,S3 bucket path" }
     /\<user\>/{ u=$10 }
     /Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
     /s3:\/\//{ print $1,u,q,$10 }' OFS=',' hive-server2.log

The output:

TimeStamp,User,Query,S3 bucket path
2018-01-18T16:23:25,972,a8197zz,select * from pfeevent limit 5,'s3://3m-his-dev-cayuga/Demo-Enterprise/TrustedZone/Published/Enrichment/Core/PFE/PFEEvent/part-00000-f4340808-220a-424c-ba67-3f2383ea42ea-c000.snappy.parquet'
Sign up to request clarification or add additional context in comments.

10 Comments

@arunabimaniyu, that's why I wrote "for static log format". When you will describe and elaborate all the crucial lines format of your log file and make your conditions more clear - you'll increase your chances
TimeStamp,User,Query,S3 bucket path ,'s3://3m-his-dev-cayuga/Demo-Enterprise/TrustedZone/Published/Enrichment/Core/PFE/PFEEvent/part-00000-f4340808-220a-424c-ba67-3f2383ea42ea-c000.snappy.parquet'
whatever output u got.. that one also I am not getting..Whether I am making any mistake??
Humm any idea to how I get the required output ?
@shellter, thanks. tech support is always much harder than givings answers. )
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.