1

I'm trying to read a .csv to work with it in an .accdb

The file has ; as delimiter and "" as string qualifier. Young and naive as I was I just split the file at the delimiter:

Set oFSO = New FileSystemObject
Set oStream = oFSO.OpenTextFile(sFilePath, ForReading)
Do Until oStream.AtEndOfStream
    sLine = oStream.ReadLine
        sArray = Split(sLine, ";")
        ....

Now I got a line that reads:

"String";"Str;ing";0;0;0;"String"

So I have delimiter inside one of the strings which makes the code above not work. Any ideas how to solve this?

EDIT:

I've found someone with a similar problem, only with a comma as delimiter. And they solved it using regular expressions. The problem: I'm absolutely not good with regular expressions. In the example the used this expression and code:

Function regLine(sLine As String) As String
Dim oRegEx As RegExp
    Set oRegEx = New RegExp
    oRegEx.IgnoreCase = True
    oRegEx.Global = True

    ' Pattern: ",(?=([^"]*"[^"]*")*(?![^"]*"))"
    oRegEx.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"

    regLine = oRegEx.Replace(sLine, ";")
End Function

So I don't really understand the expression. My first idea was to replace the comma with a semicolon but that didn't work.

3 Answers 3

3
Option Explicit 

Dim line 
    line ="""String"";""Str;ing"";0;0;0;""String"""
    WScript.Echo line

Dim aFields
    With New RegExp
        .Pattern = "(""[^""]*"")?;"
        .Global = True 
        aFields = Split(.Replace(line, "$1"&Chr(0)),Chr(0))
    End With

Dim field
    For Each field In aFields
        WScript.Echo field
    Next 

Code is .vbs, but shows how to use the regular expression to replace semicolons not enclosed in quotes with a null character and use the null character to split the line into its fields.

Sign up to request clarification or add additional context in comments.

4 Comments

not sure if I did something wrong, but the for me the output string looks like ;;0;0;0;"String"
@FNR, to test, copy code, save as test.vbs, run with cscript test.vbs to execute on console or double click on file. I've also tested it as VBA in excel, replacing Wcript.Echo with MsgBox, including the reference to Microsoft VBScript Regular Expressions or using CreateObject("VBScript.RegExp") and in both cases it works. Please, include in your question the code you are using so we can see where it could fail.
Set oRegEx = New RegExp oRegEx.Global = True oRegEx.Pattern = "(""[^""]*"")?;" sLine2 = oRegEx.Replace(sLine, ";") regLine = Split(sLine2, ";") sLine is the same string as in my original post
@FNR, 1 - the regular expression is using a capture group to separate the quoted strings from semicolons, you forgot the $1 in the replace that should look as .Replace(sLine, "$1;") BUT 2 - I see no reason to replace a semicolon with a semicolon, you will end with the same problem. Use as replacement a character not present in data (as Chr(0)) and split using this character.
1

I solved the problem now by writing a loop, that deletes the delimiter if it is in a string.

Function fixLine(sLine As String)
Dim i As Long
Dim bInString As Boolean

bInString = False
fixLine = ""
For i = 1 To Len(sLine)
    If Mid(sLine, i, 1) = Chr(34) Then
         If bInString Then
            bInString = False
        Else
            bInString = True
        End If
     End If
    If bInString And Mid(sLine, i, 1) = ";" Then
    Else
        fixLine = fixLine & Mid(sLine, i, 1)
    End If
Next
End Function

It kind of feels quick and dirty and I'm not sure about the performance but it works.

EDIT: I also worked with theabove example I found. It replaces the delimiter in a line outside of strings. So I replaced the delimiter with Chr(0) which I know won't apear in a line and then split at the new delimiter.

Function regLine(sLine As String) As String()
Dim oRegEx As RegExp
Dim sLine2() As String
    Set oRegEx = New RegExp
    oRegEx.Global = True

    'Pattern: ";(?=([^"]*"[^"]*")*(?![^"]*"))"
    oRegEx.Pattern = ";(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"

    sLine2 = oRegEx.Replace(sLine, Chr(0))
    regLine = Split(sLine2, Chr(0))
End Function

2 Comments

@MCND's answer proves that no dirty hacks are necessary.
Much better, but MCND's approach/pattern is still superior.
0

My first question is: Is there any case where a ";" in the string values is a valid string? If so, I don't see any way other than manually verifying the data.

If not, how large is the input file? If it's not too big (for various definitions of "too" :-) ) then just manually scan it for errors.

If it is very large, I'd simple write a preprocesser program that reads the string values then deletes any ";" in those where it occurs. Such a program is only about a dozen lines long. Then run the clean file into Access.

2 Comments

Sadly it is a very big file, baout 40000 lines. So doing it manually is not really an option
@MCND's answer proves that neither pessimism nor manual modification is justified.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.