2

I have many tables in the database that have at least one column that contains a Url. And these are repeated a lot through-out the database. So I normalize them to a dedicated table and I just use numeric IDs everywhere I need them. I often need to join them so numeric ids are much better than full strings.

In MySql + C++, to insert a lot of Urls in one strike, I used to use multi-row INSERT IGNOREs or mysql_set_local_infile_handler(). Then batch SELECT with IN () to pull the IDs back from the database.

In C# + SQLServer I noticed there's a SqlBulkCopy class that's very useful and fast in mass-insertion. But I also need mass-selection to resolve the Url IDs after I insert them. Is there any such helper class that would work the same as SELECT WHERE IN (many, urls, here)?

Or do you have a better idea for turning Urls into numbers in a consistent manner in C#? I thought about crc32'ing the urls or crc64'ing them but I worry about collisions. I wouldn't care if collisions are few, but if not... it would be an issue.

PS: We're talking about tens of millions of Urls to get an idea of scale.

PS: For basic large insert, SQLBulkCopy is faster than SqlDbType.Structured. Plus it has the SqlRowsCopied event for a status tracking callback.

2 Answers 2

2

There is even a better way than SQLBulkCopy.

It's called Structured Parameters and it allows you to pass a table-valued parameter to stored procedure or query through ADO.NET.

There are code examples in the article, so I will only highlight what you need to do to get it up and working:

  1. Create a user defined table type in the database. You can call it UrlTable
  2. Setup a SP or query which does the SELECT by joining with a table variable or type UrlTable
  3. In your backing code (C#), create a DataTable with the same structure as UrlTable, populate it with URLs and pass it to an SqlCommand through as a structured parameter. Note that column order correspondence is critical between the data table and the table type.

What ADO.NET does behind the scenes (if you profile the query you can see this) is that before the query it declares a variable of type UrlTable and populates it (INSERT statements) with what you pass in the structured parameter.

Other than that, query-wise, you can do pretty much everything with table-valued parameters in SQL (join, select, etc).

Sign up to request clarification or add additional context in comments.

3 Comments

Mersi. Pare ca asta e solutia. Acum am de studiu... :)
@CodeAngry: N-ai pentru ce :)
Implemented it. Works perfectly and it's blazing fast! 2 weeks into C# and SQLServer... I like it. Makes my life easier.
0

I think you could use the IGNORE_DUP_KEY option on your index. If you set IGNORE_DUP_KEY = ON on the index of the URL column, the duplicate values are simply ignored and the rest are inserted appropriately.

1 Comment

I do that already. It's the cure for the lack of INSERT IGNORE in SQLServer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.