Info
I've created a script which analyzes the debug logs from Windows DNS Server.
It does the following:
- Open debug log using
[System.IO.File]class - Perform a regex match on each line
- Separate 16 capture groups into different properties inside a custom object
- Fills dictionaries and appends to the value of each key to produce statistics
Steps 1 and 2 take the longest. In fact, they take a seemingly endless amount of time, because the file is growing as it is being read.
Problem
Due to the size of the debug log (80,000kb) it takes a very long time.
I believe that my code is fine for smaller text files, but it fails to deal with much larger files.
Code
Here is my code: https://github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1
Debug log preview
This is what the debug looks like (including the blank lines)
Multiply this by about 100,000,000 and you have my debug log.
21/03/2014 2:20:03 PM 0D0C PACKET 0000000005FCB280 UDP Rcv 202.90.34.177 3709 Q [1001 D NOERROR] A (2)up(13)massrelevance(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000042EB8B0 UDP Rcv 67.215.83.19 097f Q [0000 NOERROR] CNAME (15)manchesterunity(3)org(2)au(0)
21/03/2014 2:20:03 PM 0D0C PACKET 0000000003131170 UDP Rcv 62.36.4.166 a504 Q [0001 D NOERROR] A (3)ekt(4)user(7)net0319(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000089F1FD0 UDP Rcv 80.10.201.71 3e08 Q [1000 NOERROR] A (4)dns1(5)offis(3)com(2)au(0)
Request
I need ways or ideas on how to open and read each line of a file more quickly than what I am doing now.
I am open to suggestions of using a different language.