2

I currently store my csv formatted files on disk and then query them like this:

SELECT *
FROM OPENROWSET(BULK 'C:\myfile.csv',
    FORMATFILE = 'C\format.fmt',
    FIRSTROW = 2) AS rs

Where format.fmt are the defined format of the columns in the csv file. This works very well. But I'm interested in storing the file in a SQL Server table instead of storing them at disk. So when having a VARBINARY(MAX) datatype column. How do I query them?

If I have a table like:

CREATE TABLE FileTable
(
    [FileName] NVARCHAR(256)
    ,[File] VARBINARY(MAX)
)

With one row 'myfile.csv', '0x427574696B3B44616....'

How to read that file content into a temporary table for example?

6
  • 3
    If you've got CSV data, why not just import it into the database? Commented May 16, 2014 at 13:10
  • I'm not exactly following. Yes I got the CSV data because the user uploads a file containing that data. So instead of storing that data as a binary I could store it in a NVARCHAR(MAX) column: 'value1;value2;value3\r\nvalue4;value5;value6\r\n'. This may be a simple question but how do I parse that column without writing to much custom code? Commented May 16, 2014 at 13:19
  • If you've got CSV data then it's a simple matter to get it into the database and work with it directly rather than going round the houses of trying to work with the data in the file. There's a guide here: blog.sqlauthority.com/2011/05/12/… Commented May 16, 2014 at 13:21
  • Since I need to store it in the database anyway because I need a backup of the original data and the best option for me is to store it in the database in the table right where it belongs my thought was that storing it on the disk just to be able to import it to another table was "going round the houses". This will be a temporary save to the disk, import it and then delete the file on disk. Seems unnecessary to me if I could import it directly from the column where it's already exists. That's the whole point of my question. Will I be able to avoid saving to disk and read it directly from table. Commented May 16, 2014 at 13:32
  • Seriously, unless there's something about your setup that you're not telling us, you're looking at this totally the wrong way. You can backup your file in the database as a database table really easily. And it'll be far easier to work with in that format than trying to cast to VARBINARY and back again. Commented May 16, 2014 at 13:49

2 Answers 2

2

If you really need to work with varbinary data, you can just cast it back to nvarchar:

DECLARE @bin VARBINARY(MAX)
SET @bin = 0x5468697320697320612074657374

SELECT CAST(@bin as VARCHAR(MAX))
-- gives This is a test

Once you've got it into that format, you can use a split function to turn it into a table. Don't ask me why there isn't a built-in split function in SQL Server, given that it's such a screamingly obvious oversight, but there isn't. So create your own with the code below:

CREATE FUNCTION [dbo].[fn_splitDelimitedToTable] ( @delimiter varchar(3), @StringInput VARCHAR(8000) )
RETURNS @OutputTable TABLE ([String] VARCHAR(100), [Hierarchy] int )
AS
BEGIN

    DECLARE @String    VARCHAR(100)
    DECLARE @row int = 0

    WHILE LEN(@StringInput) > 0
    BEGIN
        SET @row = @row + 1
        SET @String      = LEFT(@StringInput, 
                                ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput) - 1, -1),
                                LEN(@StringInput)))
        SET @StringInput = SUBSTRING(@StringInput,
                                     ISNULL(NULLIF(CHARINDEX(@delimiter, @StringInput), 0),
                                     LEN(@StringInput)) + 1, LEN(@StringInput))

        INSERT INTO @OutputTable ( [String], [Hierarchy] )
        VALUES ( @String, @row )
    END

    RETURN
END

Put it all together:

select CAST('one,two,three' as VARBINARY)
-- gives 0x6F6E652C74776F2C7468726565

DECLARE @bin VARBINARY(MAX)
SET @bin = 0x6F6E652C74776F2C7468726565

select * from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))

gives this result:

string hierarchy
================
one    1
two    2
three  3

And of course, you can get the result into a temp table to work with if you so wish:

select * into #myTempTable
from fn_splitDelimitedToTable(',', CAST(@bin as VARCHAR(MAX)))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for trying to solve the problem. Doesn't work so well with more than one column and multiple rows yet but I can look it through and fix that. Anyway I was looking for something built in doing this. Otherwise my working solution is acceptable. Was curious if I could avoid some unnecessary operations. Thanks.
No problem. It should handle columns okay, but I can see how rows would be an issue: you'll need to find out what your line delimiters are and try working with those. Be grateful for an upvote if the answer is at least helpful :)
0

If you've got CSV data, why not just import it into the database?

You can also use BULK INSERT to do this as in this question.

Assuming you've created a table with the correct format to import the data into (e.g. 'MyImportTable') something like the following could be used:

BULK INSERT MyImportTable
FROM 'C:\myfile.csv'
WITH (DATAFILETYPE='char',
      FIRSTROW = 2,
      FORMATFILE = 'C\format.fmt');

EDIT 1:

With the data imported into the database, you can now query the table directly, and avoid having the CSV altogether like so:

 SELECT *
   FROM MyImportTable

With the reference to the original CSV no longer required you can delete/archive the original CSV.

EDIT 2:

If you've enabled xp_cmdshell, and you have the appropriate permissions, you can delete the file from SQL with the following:

xp_cmdshell 'del c:\myfile.csv'

Lastly, if you want to enable xp_cmdshell use the following:

exec sp_configure
go
exec sp_configure 'xp_cmdshell', 1
go
reconfigure
go

4 Comments

That is what I'm currently doing. At least sort of. I would like to change that, hence the question...
Please see the edit: By importing the data into a database table you're no longer required to store/access the original CSV file. Your current solution operates on the CSV file and presents the data like a table.
Yes and that is because I'm not interested in storing the data in a table because the data does not go raw into a table. It needs a lot of manipulation and calculating. I know that I could update a table later on but currently I work out the data and when it's finished I insert it. But yes I need to store the original file. It's always a good habit because when the customer comes and says he/she send that type of data I can confirm or see that it was not the case by looking at the original file. Please read the comments on the main post.
I'm not asking about how to bulk insert a file. I've been knowing how to do that for that last five years. If you read the question I'm saying that this is what I'm currently doing. Thanks anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.