0

Basically I am dealing with CSV file and reading it line by line in C#. I have a string input(a line) and trying to find a Regex pattern and replace it using another Regex pattern but result is not what I expect.

var input = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";

In output I need to replace internal commas between double quotations with semicolon where those double quotation are between commas themselves.

between double quotation and external comma (comma outside of pair of double quotes) it can be only white space.

So I expect output to be: "efgh ;ijkl123,",abcd , "efgh ;ijkl123,",mnop456 "efgh ,ijkl123,"

my code:

var pattern = @".*,\s*""(.*,+.*)+""\s*,.*";
var replacePattern = @".*,\s*""(.*;+.*)+""\s*,.*";
if (Regex.IsMatch(input, pattern))
{
    var output = Regex.Replace(input, pattern, replacePattern);
}

but running my code, output is: .,\s"(.;+.)+"\s*,.* which is replacePattern.

EDIT more input sample and output as expected:

  1. input abcd , "efgh ,ijkl123,",mnop456

    output abcd , "efgh ;ijkl123;",mnop456

  2. input "efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

    output "efgh ;ijkl123;",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123,"

  3. input ,"efgh ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

    output ,"efgh ;ijkl123;",abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

  4. input ,"efgh" ,ijkl123,",abcd" , "efgh ijkl123,",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

    output ,"efgh" ,ijkl123,";abcd" , "efgh ijkl123;",mnop456 "efgh ,ijkl123,","efgh ,ijkl123,"mnop456

  5. input efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

    output efgh ,ijkl123,",abcd , "efgh ;ijkl123;",mnop456 "efgh ,ijkl123,"

4
  • What I'm reading from your statement is that you want to replace , coma with ; semi colon if it's between double quotes? Commented Apr 14, 2016 at 2:05
  • What's the expect behavior if the string have more than 2 double quotes ? Commented Apr 14, 2016 at 2:09
  • yeah and that pair of double quotes are between comma as well. if I want to simplify, it would be like this: , " , " , => , " ; " , Commented Apr 14, 2016 at 2:09
  • @Zee I need to find any pair of double quotes in input. so it could be 2, 4, 6,... double quotes. Commented Apr 14, 2016 at 2:12

2 Answers 2

1

Well, this is bit tricky and I'm sure someone will suggest a better regex than mine. Suppose you input text is:

"efgh ,ijkl123,",abcd ,  "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"

You can try:

var data = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";

var rx = @"(?<=(^|,[ \t]*))\""[^\""\n]+\""(?=[ \t]*(,|$))";

var matches = Regex.Matches (data, rx);

foreach (Match match in matches) {
    data = new Regex (match.Value).
        Replace(data, match.Value.Replace (',', ';'), 1);
}

Console.WriteLine (data);

It will emit:

"efgh ;ijkl123;",abcd ,  "efgh ;ijkl123;",mnop456, "efgh ,ijkl123," 

Code above is essentially replacing all , comas between double quotes with ; semi colons.

Sign up to request clarification or add additional context in comments.

10 Comments

nice one! but in your example it should be 2 matches not 3. the last one shouldn't match because what I want is that between external comma (comma outside of pair of double quotes) and double quotation it can be only white space.
hmm, well in that case my sample input is incorrect. it's not a valid CSV file. Can you give it a try with valid string and see how it goes. You can try in regex101 (link is posted above). Anyway, I have updated my sample input and expected output strings.
no your sample input was great and it was a standard CSV format. I am going to adopt it as my input.
It should even work with a tricky CSV. efgh ,ijkl123,",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,". Note single double quote.
no unfortunately I get this: efgh ;ijkl123;",abcd , "efgh ,ijkl123,",mnop456 "efgh ,ijkl123,"
|
0

Not sure is it very efficient, but works. Suggestions are welcome to improve it further.

string  input = "\"efgh ,ijkl123,\",abcd ,  \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";; 

Regex.Matches(input, "\"([^\"]*)\"(,)") // Extract string between quotes followed by ','.
.Cast<Match>()
    .ToList()
    .ForEach(m=> input = input.Replace(m.Value, m.Value.Replace(",",";")) // for each match replace with ';' inserted match.
                              .Replace(";\";",",\","));  // a hack, should have done it better

Ouput :

"efgh ;ijkl123,",abcd ,  "efgh ;ijkl123,",mnop456 "efgh ,ijkl123,"

Working Demo

3 Comments

Thanks! I just edited my post. between double quotation and external comma (comma outside of pair of double quotes) it can be only white space. so if I use this input in your code, I expect 2 replaces not 3: var input = "\"efgh ,ijkl123,\",abcd , \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"";
it didn't work for this: "\"efgh ,ijkl123,\",abcd , \"efgh ,ijkl123,\",mnop456 \"efgh ,ijkl123,\"" i got "\"efgh ;ijkl123,\",abcd , \"efgh ;ijkl123,\",mnop456 \"efgh ,ijkl123,\""
What is expected for this string? seems working as expected to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.