read text file by using regular expression and format in Excel

Question

I have a txt file including some comments lines and a lot of data lines like below

XYZ3-CCAV::[2] mcb XYZ3 hpy diag ce56 dsc
[UT000029118.494] XYZ3:mcb >> LN (CDRxN  , UC_CFG,XTP_RST,STP) SD LCK XRMPP CLK90 CLKP1 PF(M,L) VGA DCO P1kII M1kII  EPD(1,2,3,4,5,6)       XTMPP  AMAP(n1,m,p1,2,3,rpara)   Head(L,R,U,D)  LINK_TIME
[UT000029118.495] XYZ3:mcb >>  0 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    44     2     0,1   17   4 205    0  30,  2,  2, -2,  1,  1      0   22, 90, 0, 0, 0, 0     296,464,153,155    57.6
[UT000029118.495] XYZ3:mcb >>  1 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    44     0     0,1   17   2 202    0  31,  2, -1,  5, -1,  1      0   22, 90, 0, 0, 0, 0     296,464,155,155    58.5
[UT000029118.496] XYZ3:mcb >>  2 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    43     0     0,1   17   0 209    0  33,  1,  0,  1,  3, -3      0   22, 90, 0, 0, 0, 0     312,449,159,159    60.1
[UT000029118.497] XYZ3:mcb >>  3 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   1    45     0     0,1   17   6 202    0  33,  2,  0, -1,  3,  0      0   22, 90, 0, 0, 0, 0     328,449,153,159    60.3
[UT000029118.497] XYZ3:mcb >> 

XYZ3-CCAV::[2] Headscan 51 0 0xf 0
Headscan: min_dwell_bits 100000
Headscan: max_dwell_bits 100000000

I can use Excel built-in regular expression (VBS) to extract the data line

[UT000029118.495] XYZ3:mcb >>  0 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    44     2     0,1   17   4 205    0  30,  2,  2, -2,  1,  1      0   22, 90, 0, 0, 0, 0     296,464,153,155    57.6
[UT000029118.495] XYZ3:mcb >>  1 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    44     0     0,1   17   2 202    0  31,  2, -1,  5, -1,  1      0   22, 90, 0, 0, 0, 0     296,464,155,155    58.5
[UT000029118.496] XYZ3:mcb >>  2 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   0    43     0     0,1   17   0 209    0  33,  1,  0,  1,  3, -3      0   22, 90, 0, 0, 0, 0     312,449,159,159    60.1
[UT000029118.497] XYZ3:mcb >>  3 (OSx1:x1, 0x0c,     0,0,    0)  1*  1*   1    45     0     0,1   17   6 202    0  33,  2,  0, -1,  3,  0      0   22, 90, 0, 0, 0, 0     328,449,153,159    60.3

I tried to write the data lines into an Excel file using below code (A sheet named "EyeInfo" was created in an Excel file):

Sub open_log_file()
    Dim Full_Name As String, text As String, textline As String
    Dim ws As Worksheet 'Used to Store file path and file name

    'Set up worksheet
    Set ws = Worksheets("EyeInfo")
    ws.UsedRange.Clear

    'Call the Window to open the file
    Full_Name = Application.GetOpenFilename("Diag Log File(*.log;*.txt;*.*),*.log;*.txt;*.*")

    'read the file
    Open Full_Name For Input As #1
    Do Until EOF(1)
        Line Input #1, textline
        text = text & textline
    Loop
    Close #1

    ' define regular expression
    Dim regEx_CE As Object
    Set regEx_CE = CreateObject("VBScript.RegExp")

    With regEx_CE
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\w*\[\d+\]\s+mcb\s+XYZ3\s+hpy\s+diag\s+(ce\d+)\s+dsc"
    End With

    Dim regEx_LN As Object
    Set regEx_LN = CreateObject("VBScript.RegExp")

    With regEx_LN
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\[\w*\.\w*\]\s*\w*:\w*\s*>>\s*\d+.*"
    End With

    ' Execute the match process line by line and put the data in Excel/EyeInfo
    Set CE_match = regEx_CE.Execute(text)
    Set LN_match = regEx_LN.Execute(text)
    ws.Cells(1, 1) = Full_Name
    ws.Cells(2, 1) = "Number of Ports to Be Extracted"
    ws.Cells(2, 2) = CE_match.Count
    For i = 0 To CE_match.Count - 1
        ws.Cells(i * 4 + 3, 1) = CE_match(i).Value
        ws.Cells(i * 4 + 3, 2) = LN_match(i * 4 + 0).Value
        ws.Cells(i * 4 + 4, 2) = LN_match(i * 4 + 1).Value
        ws.Cells(i * 4 + 5, 2) = LN_match(i * 4 + 2).Value
        ws.Cells(i * 4 + 6, 2) = LN_match(i * 4 + 3).Value
    Next
End Sub

What I wanted to do is to put the data in a row delimited by space or comma, so that each data in the data line can be well put in each cell of the row. But this code puts the whole data line in a single cell in Excel.

Then I'd suspect there's an issue with your regular expression or the way you process the matches. No telling without seeing that code, though. Also, do you want to do this in Excel (VBA) or VBScript? The languages are similar but not the same. — Ansgar Wiechers
– Ansgar Wiechers, Commented Dec 17, 2017 at 23:16
The regular express should not matter. I just use regex to extract data out. My question is how to put the extracted data into of Excel Cell. Look at part of data as below. My code will put all data in one single cell. What I expect is to put them into 10 cells. 22, 90, 0, 0, 0, 0 296,464,153,155 — spices
– spices, Commented Dec 18, 2017 at 1:40
I cannot reproduce your problem with the limited information you provide. And the code snippet you show will put data from whatever you have stored in CE_match(i) into a single cell in column A of your worksheet. Since you choose not to share your code, nor even tell us whether you need VBA or VBSCRIPT code, I suspect your question will be closed. — Ron Rosenfeld
– Ron Rosenfeld, Commented Dec 18, 2017 at 1:56

Ron Rosenfeld · Accepted Answer · 2017-12-18 20:30:50Z

2

Absolutely your code and data was needed to troubleshoot this. Although other things could be changed, the basic problem was your routine that reads the text file. That routine was removing all of the EOL tokens.

When you use the Line Input statement, Carriage return-linefeed sequences are skipped rather than appended to the character string.

So when that happened, your regEx_LN pattern would read only a single line, as the * at the end of the pattern says to read in everything until getting to either an EOL or the end of the string. There being only a single line in text, the entire file (from the starting point) was read in.

With the below change, your routine works on your data:

'read the file
Open Full_Name For Input As #1
Do Until EOF(1)
    Line Input #1, textline
    text = text & vbCrLf & textline
Loop
Close #1

text = Mid(text, 2) 'remove first crlf

Here is what it looks like after making that modification and running your code:

In your original question, you indicated you wanted to also split the data lines into columns based on the delimiter being either a space or a comma.

Also, as emphasized by @AnsgarWiechers in his comment below, it is simpler to read the entire file in one step, rather than reading in each line separately and concatenating.

In his comment, he showed a line using the Line Input method of doing that.

I prefer using the FileSystemObject in general to read in text files. There are certain situations where the data format and reading requirements can cause issues with the Line Input method.

Below is code that

Reads the entire file in one step using the FSO
Also parses the data lines into individual cells

=======================================

Sub open_log_file()

Dim Full_Name As String, text As String, textline As String
Dim ws  As Worksheet 'Used to Store file path and file name

'Set up worksheet
Set ws = Worksheets("EyeInfo")
ws.UsedRange.Clear

'Call the Window to open the file
Full_Name = Application.GetOpenFilename("Diag Log File(*.log;*.txt;*.*),*.log;*.txt;*.*")

'read the file
'Open Full_Name For Input As #1
'Do Until EOF(1)
'    Line Input #1, textline
'    text = text & vbCrLf & textline
'Loop
'Close #1

'text = Mid(text, 2)

'Using FSO to read the file
Dim FSO As Object
Dim TS As Object

Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile(Full_Name, ForReading)
text = TS.ReadAll


' define regular expression
Dim regEx_CE As Object
Set regEx_CE = CreateObject("VBScript.RegExp")

With regEx_CE
    .Global = True
    .MultiLine = True
    .IgnoreCase = False
    .Pattern = "\w*\[\d+\]\s+mcb\s+XYZ3\s+hpy\s+diag\s+(ce\d+)\s+dsc"
End With

Dim regEx_LN As Object
Set regEx_LN = CreateObject("VBScript.RegExp")

With regEx_LN
    .Global = True
    .MultiLine = True
    .IgnoreCase = False
    .Pattern = "\[\w*\.\w*\]\s*\w*:\w*\s*>>\s*\d+.*"
End With

' Execute the match process line by line and put the data in Excel/EyeInfo
Set CE_match = regEx_CE.Execute(text)
Set LN_match = regEx_LN.Execute(text)
ws.Cells(1, 1) = Full_Name
ws.Cells(2, 1) = "Number of Ports to Be Extracted"
ws.Cells(2, 2) = CE_match.Count
For i = 0 To CE_match.Count - 1
    ws.Cells(i * 4 + 3, 1) = CE_match(i).Value
    ws.Cells(i * 4 + 3, 2) = LN_match(i * 4 + 0).Value
    ws.Cells(i * 4 + 4, 2) = LN_match(i * 4 + 1).Value
    ws.Cells(i * 4 + 5, 2) = LN_match(i * 4 + 2).Value
    ws.Cells(i * 4 + 6, 2) = LN_match(i * 4 + 3).Value

    ws.Range(ws.Cells(i * 4 + 3, 2), ws.Cells(i * 4 + 6, 2)).TextToColumns _
        DataType:=xlDelimited, _
        textqualifier:=xlTextQualifierNone, _
        consecutivedelimiter:=True, _
        Tab:=False, _
        semicolon:=False, _
        comma:=True, _
        Space:=True, _
        other:=False

Next

End Sub

=======================================

And here are the results with your data:

edited Dec 18, 2017 at 20:30

answered Dec 18, 2017 at 19:48

Ron Rosenfeld

62k7 gold badges35 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ansgar Wiechers Over a year ago

I just arrived at the same conclusion. I wouldn't recommend reading the file line by line, though. Not if you're gluing the lines back together anyway. Just use text = Input$(LOF(1), 1).

Ron Rosenfeld Over a year ago

I agree with that approach. I just wanted to make the smallest change possible in his routine. Now to parse the lines

spices Over a year ago

Big thanks for all your help. The new code works very well.

Ron Rosenfeld Over a year ago

@spices Glad to help. Since this seems to have answered your question, I would appreciate it if you could mark my answer as accepted. You can click on the check mark beside the answer to toggle it from greyed out to filled in.

spices Over a year ago

@RonRosenfeld Iam not quite familiar with this website. I just marked your answer "Accepted". It should be there already. thanks very much for all of your help!

|

Collectives™ on Stack Overflow

read text file by using regular expression and format in Excel

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related