You mentioned using the multiline regex is taking too long and asked about the state machine approach. So here is some code using a function to perform the operation (note, the function could probably use a little cleaning, but it shows the idea and works faster than the regex). In my testing, using the regex without multiline, I could process 1,000,000 lines (in memory, not writing to a file) in about 34 seconds. Using the state-machine approach it was about 4 seconds.
string RemoveInternalPipe(string line)
{
int count = 0;
var temp = new List<char>(line.Length);
foreach (var c in line)
{
if (c == '\'')
{
++count;
}
if (c == '|' && count % 2 != 0) continue;
temp.Add(c);
}
return new string(temp.ToArray());
};
File.WriteAllLines(@"yourOutputFile",
File.ReadLines(@"yourInputFile").Select(x => RemoveInternalPipe(x)));
To compare the performance against the Regex version (without the multiline option), you could run this code:
var regex = new Regex(@"(?<=^[^']*'([^']*'[^']*')*[^']*)\|");
File.WriteAllLines(@"yourOutputFile",
File.ReadLines(@"yourInputFile").Select(x => regex.Replace(x, string.Empty));
BULK INSERT Product FROM 'D:\product.data' WITH ( FIELDTERMINATOR = '|', ROWTERMINATOR = '\n' );I use BULK INSERT SQL Server to import file but have errors in these fields. Bulk load data conversion error (type mismatch or invalid character for the specified codepage)."so you will potentially have other issues, not just the extraneous pipe characters? Maybe it would be better if you wrote the question stating the actual problem rather than your proposed workaround, as there are potentially better ways to solve it.