1

I have the following example strings:

TAR:100
TAR:100|LED:50
TAR:30|LED:30|ASO:40

I need a regex that obtains the numeric values after the colon, which are always in the range 0 to 100 inclusive.

The result after the regex is applied to any of the above strings should be:

for TAR:100 the result should be 100

for TAR:100|LED:50 the result should be the array [100,50]

for TAR:30|LED:30|ASO:40 the result should be the array [30,30,40]

The word before the colon can have any length and both upper and lowercase.

I have tried with the following but it doesn't yield the result I need:

 String text = "TAR:100|LED:50";
 String pattern = "\\|?([a-zA-Z]{1,}:)";
 string[] values= Regex.Split(text, pattern);

The regex should work whether the string is TAR:100 or TAR:100|LED:50 if possible.

2 Answers 2

1

You added () which makes the text parts that you want to remove also be returned.

Below is my solution, with a slightly changed regex.

Note that we need to start looping the values at i = 1, which is purely caused by using Split on a string that starts with a split-sequence; it has nothing to do with the Regex itself.
Explanation: if we used a simpler str.Split to split by a separator "#", then "a#b#c" would produce ["a", "b", "c"], whereas "#b#c" would produce ["", "b", "c"]. In general, and by definition: if Split removes N sequences by which the string gets splitted, then the result is N+1 strings. And all the strings that we deal with here are of the form "#b#c", so there is always an empty first result.

Accepting that as a given fact, the results are usable by starting from i = 1:

var pattern = @"\|?[a-zA-Z]+:";
var testCases = new[] { "TAR:100", "TAR:100|LED:50", "TAR:30|LED:30|ASO:40" };
foreach (var text in testCases)
{
    string[] values = Regex.Split(text, pattern);
    for (var i = 1; i < values.Length; i++)
        Console.WriteLine(values[i]);
    Console.WriteLine("------------");
}

Output:

100
------------
100
50
------------
30
30
40
------------

Working DotNetFiddle: https://dotnetfiddle.net/i9kH8n

Sign up to request clarification or add additional context in comments.

2 Comments

thank you. any idea why the first element of the result array is a whitespace?
I gave it some more thought, and found an explanation for the (unavoidable) empty result. See the updated answer.
0

In .NET you can use the Group.Captures and use the same name for 2 capture groups and match the format of the string.

\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b

Regex demo | C# demo

enter image description here

string[] strings = {
    "TAR:100",
    "TAR:100|LED:50",
    "TAR:30|LED:30|ASO:40"
    };
string pattern = @"\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b";
foreach (String str in strings)
{
    Match match = Regex.Match(str, pattern);

    if (match.Success)
    {
        string[] result = match.Groups["numbers"].Captures.Select(c => c.Value).ToArray();
        Console.WriteLine(String.Join(',', result));
    }
}

Output

100
100,50
30,30,40

Another option could be making use of the \G anchor and have the value in capture group 1.

\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)

Regex demo | C# demo

enter image description here

string[] strings = {
    "TAR:100",
    "TAR:100|LED:50",
    "TAR:30|LED:30|ASO:40"
    };
string pattern = @"\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)";
foreach (String str in strings)
{
    MatchCollection matches = Regex.Matches(str, pattern);
    string[] result = matches.Select(m => m.Groups[1].Value).ToArray();

    Console.WriteLine(String.Join(',', result));
}

Output

100
100,50
30,30,40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.