0

I am trying to figure out what the regular expression would be for me to find the following within a massive string, and extract the value that's inside the value field - the value will always be a mixture of both numbers and letters. The length of the value will vary and I want to ignore case.

<input type="text" name="NAME_ID" value="id2654580" maxlength="25">

So in the above example, I would get 'id2654580' as the value, if that control/text was located within my massive string.

5
  • 2
    The input string looks like it is HTML. You should use HTML parser for such parsing as regex for something like this would be very very error prone. Commented Feb 21, 2014 at 15:07
  • 1
    If your file is valid xml, then you would be better searching it as XML rather than just a string. Commented Feb 21, 2014 at 15:08
  • 1
    if this is HTML, there should be some HTML helper libraries that are better suited than just regex. If it's an xml file, there's XDocument or XmlDocument. Any reason why you do not want to use those? Commented Feb 21, 2014 at 15:08
  • 1
    The HAP would help you deal with cases where you have value = "id2654580" or value= 'id2654580' or others that are all valid or "tolerated" HTML but where a too specific regex might fail to match Commented Feb 21, 2014 at 15:11
  • If you are dealing with a ton of XML, .NET already has libraries to do this. Take a look at XDocument. There is also LINQ for XML. Commented Feb 21, 2014 at 15:11

4 Answers 4

3

As the comments to the OP already pointed out: you should'nt use regex to parse html!

But as you're curious to what it would look like:
Your regex would be something like

<input.*value="(.+?)".*>

This would get you the value(s) of the input tag(s), if there are any specified.

<input   #matches "<input" literally
.*       #matches zero to unlimited characters
value="  #matches 'value="' literally
(.+?)    #captures as few characters as possible
"        #matches " literally
.*       #same as above
>        #matches > literally

In C#:

//using System.Text.RegularExpressions

string str = "<input type=\"text\" name=\"NAME_ID\" value=\"id2654580\" maxlength=\"25\">";
Regex re = new Regex(@"<input.*value=""(?<val>.+?)"".*>"); //note the named group

Match match = re.Match(str);
String value = match.Groups["val"].Value;
Sign up to request clarification or add additional context in comments.

3 Comments

wouldn't this retreive the whole input node? The OP is looking for only the value string
@Default it will match on the whole input node, but only capture the value. If you would'nt match on the whole input field you would get all values of all nodes (if there are any specified) and I understood the OP as if he wants to only get values from input fields.
cool. I'm not too familiar with regex, thus why I am wondering. Could you show how this would be used in a C# program then?
1

if you are only looking for the value, I would use:

Regex reg = new Regex(@"value=\""(?<value>[^\""]+)\""");

string value = null;

if(reg.IsMatch)
{
  Match m = reg.Match(inputstring);
  value = m.Groups["value"].Value;
}

Comments

0

That should be your regex

/value="([^"]+)"/i

here demo:http://rubular.com/r/tCj4WEtBZa

Comments

0
static string GetValue(string str, string name)
{
    var rx = new Regex(@"<input\s+type=""text""\s+name="""+ name +@"""\s+value=""(?<value>.+)""\s+maxlength=""25"">");
    return rx.Match(str).Groups["value"].Value;
}  

Usage:

    var str = @"<input type=""text"" name=""NAME_ID"" value=""id2654580"" maxlength=""25"">";
    var value = GetValue(str, "NAME_ID");  //id2654580

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.