2

I have many strings in this format:

fdg.sdfg.234fdsa.dsf_1.2.5.62.xml
23432ssdfsa_sadfsd_1.2.7.6.xml
3.3.3asdf_ddd_1.2.1.doc

I would like to get only the number
from: fdg.sdfg.234fdsa.dsf_1.2.5.62.xml to get: 1.2.5.62
from: f23432ssdfsa_sadfsd_1.2.7.6.xml to get: 1.2.7.6
from: f3.3.3asdf_ddd_1.2.1.doc to get: 1.2.1
etc

This code works:

string test = "4534534ghgggg_1.1.3.4.xml";
int to = test.LastIndexOf('.');
int from = test.LastIndexOf('_') + 1;
Console.WriteLine(test.Substring(from,to - from));

But I want to know how can I do it with regex. Any ideas?

2
  • 1
    Honestly, if the format is fixed so that you can always use _ for the start and the last . for the end, just use Substring. It's much easier for the majority of devs to read and maintain than a Regex (yes, Regexes aren't impossible to read, but a lot of devs can't). Commented Jul 15, 2016 at 19:46
  • Have you tried anything? Commented Jul 15, 2016 at 19:46

3 Answers 3

2

This code seems to work as long as the numbers you are looking for are preceded by "_".

Edited - This is the final working result

        // fdg.sdfg.234fdsa.dsf_1.2.5.62.xml 
        // 23432ssdfsa_sadfsd_1.2.7.6.xml
        // 3.3.3asdf_ddd_1.2.1.doc

        string source = "fdg.sdfg.234fdsa.dsf_1.2.5.62.xml";
         var match = Regex.Match(source, @"_[0-9]+\.[0-9]+\.[0-9]+(\.[0-9]+)*").ToString().Replace("_", "");
        Console.WriteLine(match);
        Console.ReadLine();
Sign up to request clarification or add additional context in comments.

6 Comments

be careful: . means any symbol in regular expression, so _[0-9].[0-9] will macth, say _1x2
var match = Regex.Match(source, @"[0-9]+\.[0-9]+\.[0-9]+(\.[0-9])*").ToString().Replace("", "");
Another problem: _1.2.5.62. fails to match: it should be (\.[0-9]+)* in the pattern (please, notice +)
Sorry, accidently removed a "+". Here is the line again. var match = Regex.Match(source, @"[0-9]+\.[0-9]+\.[0-9]+(\.[0-9]+)*").ToString().Replace("", "");
The only thing I'd note: the value is at the end of the string, right before the extension. Use RegexOptions.RightToLeft so that the string was parsed from right to left, that will fetch the right match and faster. And of course, get the Match object, and only use .ToString() after checking match.Success property.
|
2

First, let's elaborate the rules (number is not you want to get) for the match:

  • starts with '_' (not included in match)
  • contains digits and dots (dots are not duplcated).
  • no leading and no trailing dots are allowed
  • has at least one digit as well as at least one dot
  • ends with '.' (not included in match)

then implement a pattern:

 (?<=_)[0-9]+(\.[0-9]+)+(?=\.)

if the number in the question is, in fact, some kind of version you may want to restict number of its parts, e.g.

 (?<=_)[0-9]+(\.[0-9]+){1,3}(?=\.[^0-9])

which means that only 2 to 4 parts versions (_d.d., _d.d.d. and _d.d.d.d.) are accepted. E.g. input _1.2.15. will be accepted (3 parts: 1, 2 and 15) when _1.2.3.4.5. will be rejected (5 parts)

finally, use regular expressions:

  string source = ...
  string pattern = @"(?<=_)[0-9]+(\.[0-9]+)+(?=\.)";

  // If there are many matches, let's take the last one
  string lastMatch = Regex.Matches(pattern, source)
    .OfType<Match>()
    .Select(match => match.Value)
    .LastOrDefault();

  Console.Write(lastMatch); 

However, if format is fixed then regular expression (and Linq) is overshoot. LastIndex + Substring is a better choice.

1 Comment

You have a typo in the code: string pattern = @"?<=_)[0-9]+(\.[0-9]+)+(?=\.)"; must be string pattern = @"(?<=_)[0-9]+(\.[0-9]+)+(?=\.)";. You tend to overuse unanchored lookbehinds - I'd rather use capturing groups (I know checking one _ is no big deal, but in general, they are costly).
1

You already got all your answers. I have not practised for the last 6 months and have almost all forgotten. Anyway there are plenty of web sites ( look for regex tester in your favorite search engine ) that helps you with regex. I do not know if I can mention one more than the other but here are some snapshots of one example ( I am not the latest expert in regex so I hope I did not write something too wrong).

enter image description here enter image description here enter image description here enter image description here

So now you can test all the answers and advices that have been brought to you. .

2 Comments

You have chosen the wrong tool to understand .NET regexes: regex101 does not support .NET regex patterns. E.g. \w is not the same as [a-zA-Z0-9_], but is identical to [\p{L}\p{N}_]. \d matches Arabic, Hindi and other digits. Use Ultrapico Expresso to see what .NET subpatterns mean.
Thanks for reminding this pitfall and quoting a more adequate tool

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.