0

I want to read a html file line and line and need to store the elements .for textbox i have to store the id,name,type attribute values into some collection. In the same i need to get attributes for checkbox, radiobox etc

Is their any API to parse the html file line by line.

2

4 Answers 4

2

You can use a DOM Parser and read all Elements and Attributes. Or you could use this library(jsoup) which is based on the DOM Parser.

Sign up to request clarification or add additional context in comments.

2 Comments

I would have recommended jsoup, simple and easy to use with very good documentation. +1
reading the input elements using jsoup like "doc.getElementsByTag("input")". By using this i am able to read the attribute values. but the problem is, i should not hardcode the word "input" or "form" or "textarea".
1

Use Class StringBuilder

 StringBuilder contentBuilder = new StringBuilder();
 try {
      BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
      String str;
      while ((str = in.readLine()) != null) {
          contentBuilder.append(str);
      }
      in.close();
 } catch (IOException e) {
      System.err.println("HTML File Read Error: " + e.getMessage());
 }
 String content = contentBuilder.toString();

Comments

0

No, since that doesn't make sense: HTML has no useful notion of "line". What you need to do is read the HTML element by element.

There are lots of parsers for XML but HTML is a more lenient, so you need a special parser for it. Try JTidy.

Comments

0

NekoHTML is one of the many html parsers that you could use.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.