Extract certain substring using regex in c#

Question

I have a word which has a section say 1.2.2 and a some text followed by some other texts. I want to get the section. I have created a regex to match the section and some text.

Below is my code:

var word = "1.2.3 area consent testing, sklfjsdlkf jdifgjds visjeflk area consent testing lsdajfgo idsjgosa jfikdjfl343 fjdsl45jl sfgjsoiaetj l area consent testing";
var lowerWord = "area consent testing".ToLower();
var textLower = @word.ToLower().ToString();
Dictionary<int, string> matchRegex = new Dictionary<int, string>();
matchRegex.Add(1, @"(^\d.+(?:\.\d+)*[ \t](" + lowerWord + "))"); 


foreach (var check in matchRegex)
{
    string AllowedChars = check.Value;
    Regex regex = new Regex(AllowedChars);
    var match = regex.Match(textLower);
    if (match.Success)
    {
        var sectionVal = match.Value;
    }
}

Now my problem is, I just want the value 1.2.3 area consent testing in my sectionVal variable, but it is giving me the whole line as it is. i.e.

sectionVal = "1.2.3 area consent testing, sklfjsdlkf jdifgjds visjeflk area consent testing lsdajfgo idsjgosa jfikdjfl343 fjdsl45jl sfgjsoiaetj l area consent testing";

Shouldn't \d.+ be \d+\.? - escape the first .

Dmitrii Bychenko
– Dmitrii Bychenko

2018-05-22 14:24:13 +00:00
Commented May 22, 2018 at 14:24 — Dmitrii Bychenko
– Dmitrii Bychenko, Commented May 22, 2018 at 14:24
$@"^[0-9]+(?:\.[0-9]+)*\s{Regex.Escape(lowerCase)}"

Dmitrii Bychenko
– Dmitrii Bychenko

2018-05-22 14:30:54 +00:00
Commented May 22, 2018 at 14:30 — Dmitrii Bychenko
– Dmitrii Bychenko, Commented May 22, 2018 at 14:30

Titian Cernicova-Dragomir · Accepted Answer · 2018-05-22 14:24:28Z

2

The start of your regex contains an unescaped . which will match any character and a + after. Try this:

@"^(\d+(\.\d+)*[ \t](" + lowerWord + "))"

answered May 22, 2018 at 14:24

Titian Cernicova-Dragomir

253k37 gold badges464 silver badges394 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Furquan Khan Over a year ago

@Titianm, you are the saviour!

Dmitrii Bychenko Over a year ago

[ \t] can, probably, be turned into \s; + Regex.Escape(lowerCase) + instead of + lowerCase + to be on the safe side

Collectives™ on Stack Overflow

Extract certain substring using regex in c#

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related