1

I have multiple large log files that I'd like to export to CSV. To start with, I just want to split two parts, Date and Event. The problem I'm having is that not every line starts with a date.

Here is a sample chunk of log. Date/times are always 23 characters. The rest varies with the log and event description.

enter image description here

I'd like the end result to look like this in excel.

enter image description here

Here's what I've tried so far but just returns the first 23 characters of each line.

$content = Get-Content myfile.log -TotalCount 50 
for($i = 0; $i -lt $content.Length; $i++) {
$a = $content[$i].ToCharArray()
$b = ([string]$a[0..23]).replace(" ","")
Write-Host $b }
4
  • 3
    could you post part of the log in text please so i can try something Commented Sep 7, 2017 at 20:01
  • 2017-09-04 12:31:11.343 General BOECD:: ProcessStartTime: Word: Length 3 [0917 1204 3029 ] Hex: Length 6 [17 09 04 12 29 30] . Display: False 2017-09-04 12:31:11.479 General MelsecIoWrapper: Scan ended: device: 1, ScanStart: 9/4/2017 12:31:10 PM Display: False 2017-09-04 12:31:11.705 General BOECD:: ProcessEndTime: Word: Length 3 [0917 1204 0931 ] Hex: Length 6 [17 09 04 12 31 09] . Display: False 2017-09-04 12:31:13.082 General BOECD:: DV Data: Commented Sep 7, 2017 at 21:05
  • Note: In the actual log file, the Date always starts a line like the picture above. When I pasted the sample, it just wrapped everything together. Commented Sep 7, 2017 at 21:08
  • 2
    You should edit your question and put the sample text in there rather than responding to it. If for no other reason than the formatting issues you just encountered. Commented Sep 7, 2017 at 21:09

1 Answer 1

3

Read the file in raw as a multi-line string, then use RegEx to split on the date pattern, and for each chunk make a custom object with the two properties that you want, where the first value is the first 23 characters, and the second value is the rest of the string trimmed.

(Get-Content C:\Path\To\File.csv -Raw) -split '(?m)(?=^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'|
    Where{$_}|
    ForEach-Object{
        [PSCustomObject]@{
            'Col1'=$_.Substring(0,23)
            'Col2'=$_.Substring(23).Trim()
        }
    }

Then you can pipe that to a CSV, or do whatever you want with the data. If the files are truly massive this may not be viable, but it should work ok on files up to a few hundred megs I would think. Using your sample text that output:

Col1                    Col2
----                    ----
2017-09-04 12:31:11.343 General BOECD:: ProcessStartTime: ...
2017-09-04 12:31:11.479 General MelsecIoWrapper: Scan ended: device: 1, ScanStart: 9/4/2017 12:31:10 PM Display: False
2017-09-04 12:31:11.705 General BOECD:: ProcessEndTime: ...
2017-09-04 12:31:13.082 General BOECD:: DV Data:

The ... at the end of the two lines are where it truncated the multi-line value in order to display it on screen, but the value is there intact.

(?=...) is a so-called "positive lookahead assertion". Such assertions cause a regular expression to match the given pattern without actually including it in the returned match/string. In this case the match returns the empty string before a timestamp, so the string can be split there without removing the timestamp.

Sign up to request clarification or add additional context in comments.

2 Comments

I would make the pattern (?m)(?=^\d{4}-...) to match timestamps at the beginning of a line specifically. The hyphens and colons don't need to be escaped, BTW.
Thanks, I have a hard time remembering what all counts as a reserved character in RegEx, so I tend to over-escape sometimes. I have also updated the answer to reflect your suggestion of only getting date/times at the beginning of a line, that is an excellent idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.