1

I have a string that I need to split in an array of string. All the values are delimited by a pipe | and are separated by a comma.

|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||

The array should have the following 8 values after the split

111
2,2
room 1
13'2'' x 13'8''
""
""
""
""

by "" I simply mean an empty string. Please note that the value can also have a comma e.g 2,2. I think probably the best way to do this is through Regex.Split but I am not sure how to write the correct regular expression. Any suggestions or any better way of achieving this will be really appreciated.

1
  • 2
    A CSV library like FileHelpers should have options to support such a format Commented Nov 8, 2013 at 16:35

6 Answers 6

2

You can use Match() to get the values instead of split() as long as the values between the pipe characters don't contain the pipe character itself:

(?<=\|)[^|]*(?=\|)

This will match zero or more non-pipe characters [^|]* which are preceded (?<=\|) and followed by a pipe (?=\|).

In C#:

var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var results = Regex.Matches(input, @"(?<=\|)[^|]*(?=\|)"); 
foreach (Match match in results)
     Console.WriteLine("Found '{0}' at position {1}", 
                       match.Value, match.Index);

EDIT: Since commas always separate the values that are between pipe characters | then we can be sure that the commas used as separators will always appear at odd intervals, so we can only walk the even indexes of the array to get the true values like this:

var input = "|room 1|,|,|,||,||,||,||,||,||";

var results = Regex.Matches(input, @"(?<=\|)[^|]*(?=\|)");

for (int i = 0; i < results.Count; i+=2)
    Console.WriteLine("Found '{0}'", results[i].Value);

This can be also used in the first example above.

Sign up to request clarification or add additional context in comments.

3 Comments

This does not work when a PIKE is part of the value, but then again, escape characters (as the pike is used here) should never be part of the value. I mean what if by chance, you have |,| within the value, how would you know? You can't. +1
@Sniffer This way, I will also get the commas as well and I wont know whether the comma is a seperator or a value for example here the second value is just a comma |room 1|,|,|,||,||,||,||,||,||
@KanwarRafi This can be solved, check the edit I made to my answer.
1

Assuming all fields are enclosed by a pipe and delimited by a comma you can use |,| as the delimiter, removing the leading and trailing |

Dim data = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"
Dim delim = New String() {"|,|"}
Dim results = data.Substring(1, data.Length - 2).Split(delim, StringSplitOptions.None)

For Each s In results
  Console.WriteLine(s)
Next

Output:

111
2,2
room 1
13'2'' x 13'8''
""
""
""
""

Comments

1

No need to use a regex, remove the pipes and split the string on the comma:

var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";

var parts = input.Split(',').Select(x => x.Replace("|", string.Empty));

or

var parts = input.Replace("|", string.Empty).Split(',');


EDIT: OK, in that case, use a while loop to parse the string:

var values = new List<string>();
var str = @"|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";;
while (str.Length > 0)
{
    var open = str.IndexOf('|');
    var close = str.IndexOf('|', open + 1);

    var value = str.Substring(open + 1, open + close - 1);
    values.Add(value);

    str = open + close < str.Length - 1 
            ? str.Substring(open + close + 2) 
            : string.Empty;
}

2 Comments

Thanks @user2965995 but there's a possibility of a comma in the actual values as well. e.g. the string can also be |111|,|,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||
See my edit, it uses a while loop rather than a regex
0

You could try this:

 string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";

 string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();

Comments

0

mmm maybe this work for you:

var data = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|", "").Split(',');

Regards.,

k

EDIT: You can use wildcard

string data = "|111|,|2,2|,|,3|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|,|", "¬").Replace("|", "").Split('¬');

Regards.,

k

3 Comments

you can't use empty char literals with replace
@DWolf Ah... sure you can. Do it all the time. Regardless, this is a bad answer because it does not account for commas or pikes within the actual value.
0

Check, if this fits your needs...

var str = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";  
//Iterate through all your matches (we're looking for anything between | and |, non-greedy)                     
foreach (Match m in Regex.Matches(str, @"\|(.*?)\|"))
{
    //Groups[0] is entire match, with || symbols, but [1] - something between ()
    Console.WriteLine(m.Groups[1].Value);
}

Though, to find anything between | and |, you might and probably should use [^\|] instead of . character.

At least, for specified use case it gives the result you're expecting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.