0

I am working in AWS S3 storage where we have buckets and files are being added to the buckets. The Bucket information is logged into another bucket in text format.

I would like to convert the log information stored in text files to JSON. there however is no Key-Pair Information in the file.

The contents of the LogFile is as below: -

fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341 s3Samplebucket [10/Mar/2021:03:27:29 +0000] 171.60.235.108 fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341 MX1XP335Q5YFS06H REST.HEAD.BUCKET - "HEAD /s3Samplebucket HTTP/1.1" 200 - - - 13 13 "-" "S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.964 Linux/4.9.230-0.1.ac.224.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation" - AMNo4/b/T+5JdEVQpLkqz0SV8VDXyd3odEFmK+5LvanuzgIXW2Lv87OBl5r5tbSZ/yjW5zfFQsA= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3-us-west-2.amazonaws.com TLSv1.2

The individual values for the Log file are as below: -
Log fields

Bucket Owner: fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341
Bucket: S3SampleBucket
Time: [11/Mar/2021:** 06:** 52:** 33 +0000]
Remote IP: 183.87.60.172
Requester: arn:** aws:** iam:** :** 486031527132:** user/jdoe
Request ID: 9YQ1MWABKNRPX3MP
Operation: REST.GET.LOCATION
Key: - (BLANK)
Request-URI: "GET /?location HTTP/1.1"
HTTP status: 200
Error Code: - (BLANK)
Bytes Sent: 137
Object Size: - (BLANK)
Total Time: 17
Turn-Around Time: - (BLANK)
Referer: "-" (BLANK)
User-Agen: "AWSPowerShell/4.1.9.0 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.18363.0 WindowsPowerShell/5.0 ClientSync"
Version Id: - (BLANK)
Host Id: Q5WBxJNrwsspFmtOG+d2YN0xAtvbq1sdqm9vh6AflXdMCmny5VC3bZmyTBZavKGpO3J/uz+IfK0=
Signature Version: SigV4
Cipher Suite: ECDHE-RSA-AES128-GCM-SHA256
Authentication Type: AuthHeader
Host Header: S3SampleBucket.s3.us-west-2.amazonaws.com
TLS version: TLSv1.2

I can add the Value in a Configuration file is what I can think of. I would like to do this in either PowerShell or Python.

Any assistance wold be of great help.

1 Answer 1

2

The log format can be interpreted as a CSV (with a whitespace delimiter), so you could parse it using Import-Csv/ConvertFrom-Csv:

$columns = 'Bucket Owner', 'Bucket', 'Time', 'Remote IP', 'Requester', 'Request ID', 'Operation', 'Key', 'Request-URI', 'HTTP status', 'Error Code', 'Bytes Sent', 'Object Size', 'Total Time', 'Turn-Around Time', 'Referer', 'User-Agen', 'Version Id', 'Host Id', 'Signature Version', 'Cipher Suite', 'Authentication Type', 'Host Header', 'TLS version'

$data = @'
fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341 s3Samplebucket [10/Mar/2021:03:27:29 +0000] 171.60.235.108 fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341 MX1XP335Q5YFS06H REST.HEAD.BUCKET - "HEAD /s3Samplebucket HTTP/1.1" 200 - - - 13 13 "-" "S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.964 Linux/4.9.230-0.1.ac.224.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation" - AMNo4/b/T+5JdEVQpLkqz0SV8VDXyd3odEFmK+5LvanuzgIXW2Lv87OBl5r5tbSZ/yjW5zfFQsA= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader s3-us-west-2.amazonaws.com TLSv1.2
'@

$parsedLog = $data |ConvertFrom-Csv -Delimiter ' ' -Header $columns

Now the resulting object is easily converted to JSON:

PS ~> $parsedLog |ConvertTo-Json
{
    "Bucket Owner":  "fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341",
    "Bucket":  "s3Samplebucket",
    "Time":  "[10/Mar/2021:03:27:29",
    "Remote IP":  "+0000]",
    "Requester":  "171.60.235.108",
    "Request ID":  "fd89d80d676948bd913040b667965ef6a50a9c80a12f38c504f497953aedc341",
    "Operation":  "MX1XP335Q5YFS06H",
    "Key":  "REST.HEAD.BUCKET",
    "Request-URI":  "-",
    "HTTP status":  "HEAD /s3Samplebucket HTTP/1.1",
    "Error Code":  "200",
    "Bytes Sent":  "-",
    "Object Size":  "-",
    "Total Time":  "-",
    "Turn-Around Time":  "13",
    "Referer":  "13",
    "User-Agen":  "-",
    "Version Id":  "S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.964 Linux/4.9.230-0.1.ac.224.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation",
    "Host Id":  "-",
    "Signature Version":  "AMNo4/b/T+5JdEVQpLkqz0SV8VDXyd3odEFmK+5LvanuzgIXW2Lv87OBl5r5tbSZ/yjW5zfFQsA=",
    "Cipher Suite":  "SigV4",
    "Authentication Type":  "ECDHE-RSA-AES128-GCM-SHA256",
    "Host Header":  "AuthHeader",
    "TLS version":  "s3-us-west-2.amazonaws.com"
}

In your case, to read the file from disk, simply replace $data = ... and $data |ConvertFrom-Csv statements with Import-Csv:

$parsedLog = Import-Csv -Path .\path\to\s3requests.log -Delimiter ' ' -Header $columns
Sign up to request clarification or add additional context in comments.

7 Comments

we cant use space " " as there are space between the data which essentially is part of the same set... For Example [10/Mar/2021:03:27:29 +0000] 171.60.235.108 should be "Time": "[10/Mar/2021:03:27:29 +0000],<BR> "Remote IP": "171.60.235.108",<BR> **as per your code it is ** "Time": "[10/Mar/2021:03:27:29", "Remote IP": "+0000]", "Requester": "171.60.235.108",
@NottyHead Oh, I see. (Get-Content file.log) -replace '\[([^\]]+)\]','"$1"' |ConvertFrom-Csv ... should do the trick
Another help requested @user:712649 In the above Example, how do I convert the time stored in [10/Mar/2021:03:27:29 +0000] to MM-dd-yyyyTHH:mm:ssZ format. The reason I ask is we need this data to be used in a downstream system where the data is stored as 03-10-2021T03:27:29Z
@NottyHead There are 100s of existing posts on this site already answering that question. Try this one
@712649 $parsedLog = $data -replace '\[([^\]]+)\]','"$1"' |ConvertFrom-Csv -Delimiter ' ' -Header $columns $date_format = "yyyy-MM-ddTHH:mm:ssZ" $parsedLog | Select ([datetime]::ParseExact($_.'Time',"dd/MMM/yyyy:HH:mm:ss +0000",$null).ToSTring($date_format)) **Error: - ** Exception calling "ParseExact" with "3" argument(s): "String was not recognized as a valid DateTime." At line:18 char:1 + $parsedLog | Select ([datetime]::ParseExact($_.'Time',"dd/MMM/yyyy:HH ... + CategoryInfo : NotSpecified: (:) [], MethodInvocationException + FullyQualifiedErrorId : FormatException
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.