3

I have csv string (utf-8) obtained via a http download.

Depending on the situation the data in the string could contain a different number of columns, but each individual time a string is processed it will contain the same number of columns and be contiguous. (the data will be even).

The string could contain any number of rows.

The first row will always be the headings.

String fields will be encased in double quotes and could contain commas, quotes and newlines.

quotes and double quotes inside a string are escaped by doubling so "" and ''

In other words this is a well formed csv format. Excel through it's standard file open mechanism has no problem formatting this data.

However I want to avoid saving to a file and then opening the csv as I will need to process the output in some cases, or even merge with existing data on a worksheet.

(Added the following information via edit) The Excel Application will be distributed to various destinations and I want to avoid if possible potential permissions issues, seems that writing nothing to disk is a good way to do that

I am thinking something like the following pseudo:

rows = split(csvString, vbCrLf)  'wont work due to newlines inside string fields?

FOREACH rows as row
    fields = split(row, ',')     'wont work due to commas in string fields?
ENDFOR

Obviously that cant handle the fields containing special tokens.

What is a solid way of parsing this data?

Thanks

EDIT 13/10/2012 Data Samples

csv as it would appear in notepad (note not all line breaks will be \r\n some could be \n)

LanguageID,AssetID,String,TypeID,Gender
3,50820,"A string of natural language",3,0
3,50819,"Complex text, with comma, "", '' and new line
all being valid",3,0
3,50818,"Some more language",3,0

The same csv in Excel 2010 - opened from shell (double click - no extra options) enter image description here

1
  • 1
    You'll need to step through the string character-by-character and parse it out "manually". Commented Oct 13, 2012 at 0:14

2 Answers 2

5

If you don't mind putting the data in your workbook: You could use a blank worksheet, add the data in 1 column, then call TextToColumns. Then if you want to get the data back as an array just load it from the UsedRange of the worksheet.

'Dim myArray 'Uncomment line if storing data to array.
'Assumes cvsString is already defined
'Used Temp as sheet for processing
With Sheets("Temp")
    .Cells.Delete
    .Cells(1, 1) = cvsString
    .Cells(1, 1).TextToColumns Destination:=Cells(1, 1), DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
        Semicolon:=False, Comma:=True, Space:=False, Other:=False
    'myArray = .UsedRange 'Uncomment line if storing data to array
End With
Sign up to request clarification or add additional context in comments.

6 Comments

Does this handle The string could contain any number of rows ie separated by CrLf characters within the string (I think thats what the OP means)?
@chrisneilsen You are correct, I could have 1000's of rows in the same string sperated by crLf, and additionally have crLf as a valid (non row breaking) sequence inside string fields or each row.
@DanielCook That is interesting, certainly a partial solution, I suppose I need something like a TextToRows to start with, or perhaps I can do something with Transpose. I am going to have a play with this thanks.
This still has potential. First do a text to columns on vbCrLf. Transpose the result into a column, then do text to columns as per Daniels answer
Provided that op is using xl2007+ and number of rows < 16000
|
1

I can think of three possibilities:

  1. Use Regular Expressions to process the text. There are plenty of examples available on SO and via google for separating strings like this.
  2. Use the power of Excel: save the text to a temp file, open into a temp sheet and read the data off the sheet. Delete the file and sheet when done.
  3. Use ADO to query the data. Save the string to a temp file and run a query on that to return the fields you want.

To offer any more specific advice I would need samples of input data and expected output

2 Comments

I like the simplicity of the temp file option, however the Excel Application will be distributed to various destinations and I want to avoid if possible potential permissions issues, seems that writing nothing to disk is a good way to do that. Sorry I should have clarified that in my question (and will do). I will see if I can find a reliable regex, thanks for the suggestion.
I spent most of my Saturday playing with regular expressions unsuccessfully, I had some problems with limitations using the vbScript regex engine, and found many example that almost worked. So in the end you are correct, let Excel do what it is good at. Temp file (users appdata temp folder), and QueryTable for the win. Thanks for your time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.