I have many tables in the database that have at least one column that contains a Url. And these are repeated a lot through-out the database. So I normalize them to a dedicated table and I just use numeric IDs everywhere I need them. I often need to join them so numeric ids are much better than full strings.
In MySql + C++, to insert a lot of Urls in one strike, I used to use multi-row INSERT IGNOREs or mysql_set_local_infile_handler(). Then batch SELECT with IN () to pull the IDs back from the database.
In C# + SQLServer I noticed there's a SqlBulkCopy class that's very useful and fast in mass-insertion. But I also need mass-selection to resolve the Url IDs after I insert them. Is there any such helper class that would work the same as SELECT WHERE IN (many, urls, here)?
Or do you have a better idea for turning Urls into numbers in a consistent manner in C#? I thought about crc32'ing the urls or crc64'ing them but I worry about collisions. I wouldn't care if collisions are few, but if not... it would be an issue.
PS: We're talking about tens of millions of Urls to get an idea of scale.
PS: For basic large insert, SQLBulkCopy is faster than SqlDbType.Structured. Plus it has the SqlRowsCopied event for a status tracking callback.