2

What is the syntax for finding and selecting part of a string in Regx C#?

The string could be:

string tdInnerHtml = "<strong> You gained  230 Points </strong> 
                      there is going to be more text and some html code part of this       
                      string <a href=http://google.com>Google it here </a>";

// I want to extract 230 from this string using Regx. 
// The digits (230) vary for each tdInnerHtml. 
// So code would be to look for digits, followed by space, ending with Points
4
  • 2
    Are you parsing HTML? Use an Html parser, not Regex. That path leads to unspeakable things. Commented Apr 3, 2012 at 18:55
  • I am using HTML parser but needs to get the the SubString value inside the TD cell. Commented Apr 3, 2012 at 18:56
  • Your parser should understand how to pull attributes and content from an element. Perhaps it's hiding in the documentation? Commented Apr 3, 2012 at 18:57
  • Please don't prefix your titles with "C#" and such. That's what the tags are for. Commented Apr 3, 2012 at 19:10

5 Answers 5

4

If the space and the </strong> tag are consistent, you can use the following to get the match there, and will work with strings like: " Pints are between 230-240 Points and You gained 230 Points "

        var match = Regex.Match(tdInnerHtml, @"(?<pts>\d+) Points ?</strong>");
        if (match.Success) {
            int points = Convert.ToInt32(match.Groups["pts"].Value);
            Console.WriteLine("Points: {0}", points);
        }
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the reply. The </strong> is not consistent. "230 Points" could be anywhere inside the string.
Which part is consistent? Is the "You gained ? Points" consistent?
1

I think your regex pattern might be \b[0-9]+\b \bPoints\b.

You might test this at regexpal.

Comments

1

As long as you're only going for a set of numbers followed by the text Points, Regex can work:

Match match = Regex.Match(tdInnerHtml, @"(?<![\d-])(\d+) Points");
if (match.Success){
  // fetch result
  String pointsString = match.Groups[1].Value;

  // optional: parse to integer
  Int32 points;
  if (Int32.TryParse(pointsString, out points)){
    // you now have an integer value
  }
}

However, if this is in any way related to where the information resides on the page, formatting its surrounded by, or anything else HTML related--heed others' warnings and use an HTML parser.

2 Comments

thanks for the reply. If I have a value of: string test = "<strong> Pints are between 230-240 Points and You gained 230 Points </strong>"; your code returns 240. But I want exactly 230 followed by "Points".
@Shuaib: Use a negative look-behind to assert the \d isn't preceded by a hyphen or number. Basically, change the above code to: @"(?<![\d-])(\d+) Points" -- Example
0

The regex is very easy, \d+ Points. Here it is in C#, with a named group capture:

        var match = Regex.Match(tdInnerHtml, "(?<pts>\d+) Points");
        if (match.Success) {
            int points = (int)match.Groups["pts"].Value;
            // do something..
        }

Comments

0
string test = "<strong> You gained 230 Points </strong>";
string pattern = @"(\d+)\sPoints";
Regex regex = new Regex(pattern);
Match match = regex.Match(test);
string result = match.Success ? match.Groups[1].Value : "";

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.