0

I am parsing some name and values tag from a HTML page using Regex. However my regex is not returning all the required values.. Below is the snippet of html page-

<input style="display: none;" name="hiddenAction" value="myval" type="hidden">
<input name="ml_uiss" id="ml_uiss" value="aba972kd82lw" type="hidden">
<input style="display: none;" name="Key" id="Key" value="56n8f48jfn98cwnc38c398nc83nx2b9c32n.an24" type="text">
<input name="AvKbkGPQr" class="iswickEnabled input" maxlength="10" id="AvKbkGPQr" onkeyup="javascript:checkIt(this);" onkeydown="javascript:checkIt(this);" onchange="javascript:checkIt(this);" value="1234567890" onfocus="this.value='';" type="text"> <input name="PjbkAPker" class="iswickEnabled input" maxlength="10" id="PjbkAPker" onkeyup="javascript:checkIt(this);" onkeydown="javascript:checkIt(this);" onchange="javascript:checkIt(this);" type="text"> 
<input id="timeCheck" name="timeCheck" value="23:38:20" type="hidden">
<input name="isDone" id="isDone" value="prq" type="hidden">

Below is the code with regex-

String reg = "<input.*name=['\"](\\w+)['\"].*\\svalue=['\"]([\\w:.\\s]+)['\"].*(<input name=\"(\\w+)\")?";
Pattern p = Pattern.compile(reg);
Matcher m = p.matcher(myString);
while (m.find()) {
    String match1 = m.group(1);
    String match2 = m.group(2);
    String match3 = m.group(3);
    String match4 = m.group(4);
    System.out.println("[" + match1 + "][" + match2 + "][" + match3+ "][" + match4 + "]");
}

The output is below-

[hiddenAction][myval][null][null]
[ml_uiss][aba972kd82lw][null][null]
[Key][56n8f48jfn98cwnc38c398nc83nx2b9c32n.an24][null][null]
[AvKbkGPQr][1234567890][null][null]
[timeCheck][23:38:20][null][null]
[isDone][prq][null][null]

In the 4th line of HTML content, it is having two input name tag, because of which, this regex is not picking the 2nd input name which is PjbkAPker (This is missing in output). Rest of the things are fine.
I want to get the second input name also.

4
  • 1
    stackoverflow.com/q/1732348/1679863 Commented Feb 12, 2013 at 19:21
  • 1
    Pay special attention to the top answer in the question linked by Rohit. Commented Feb 12, 2013 at 19:22
  • Also readworthy: Parsing Html The Cthulhu Way Commented Feb 12, 2013 at 19:48
  • Don't use regular expressions to parse [X]HTML. Use a library with a proper parser (such as jsoup) instead. Commented Feb 12, 2013 at 19:54

1 Answer 1

2

Parsing X/HTML with regular expressions is a bad ideaTM.

Try using jsoup instead:

Document doc = Jsoup.parseBodyFragment(htmlString);
Elements inputs = doc.select("input");
for (Element el : inputs) {
  Attributes attrs = el.attributes();
  System.out.print("ELEMENT: " + el.tagName());
  for (Attribute attr : attrs) {
    System.out.print(" " + attr.getKey() + "=" + attr.getValue());
  }
  System.out.println();
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.