1

Hello i want to parse a HTML Table into a Android ListView but i don't know where to start. The Table has a lot of information. Could someone help me to start with this?

Thanks in advance!

The HTML Table: http://intranet.staring.nl/toepassingen/rooster/lochem/2W2/2012090320120909/2W01533.htm (Just click view source).

0

3 Answers 3

3

You will first need to parse the HTML table into a data structure, and then use ListView to display that information. Try using the JSoup library to do the HTML parsing: http://jsoup.org/cookbook/introduction/parsing-a-document

Sign up to request clarification or add additional context in comments.

1 Comment

@davidbuzatto According to their download page, it works on Android: jsoup.org/download
1

I don't know if you already got your answer here but I did the same with the link you suggest, I will post my code here but it is still very messy and don't apply for the newest timetable(9th hour)

Im using HTML Cleaner library for parsing the html:

try {   
        HtmlCleaner hc = new HtmlCleaner();
        CleanerProperties cp = hc.getProperties();
        cp.setAllowHtmlInsideAttributes(true);
        cp.setAllowMultiWordAttributes(true);
        cp.setRecognizeUnicodeChars(true);
        cp.setOmitComments(true);

        String loc = sp.getString( Constants.pref_locatie      , "" );
        String per = sp.getString( Constants.pref_persoon      , "" );
        String oob = sp.getString( Constants.pref_onderofboven , "" );

        int counteruurmax;
        int[] pauze;
        if (oob.contains("onder")){
            pauze = Constants.pauzeo;
        } else if (oob.contains("boven")) {
            pauze = Constants.pauzeb;
        } else {
            return false;
        }

        String url = "";
        if (loc.contains("lochem")) {
            url += Constants.RoosterLochem;
            url += t.getDatum();
            url += "/";
            url += per;
            counteruurmax = 11;
        } else if (loc.contains("herenlaan")) {
            url += Constants.RoosterHerenlaan;
            url += per;
            counteruurmax = 13;
        } else if (loc.contains("beukenlaan")) {
            url += Constants.RoosterBeukenlaan;
            url += per;
            counteruurmax = 11;
        } else {
            return false;
        }

        String htmlcode = t.getHtml(url);
        TagNode html = hc.clean(htmlcode);
        Document doc = new DomSerializer(cp, true).createDOM(html);
        XPath xp = XPathFactory.newInstance().newXPath();
        NodeList nl = (NodeList) xp.evaluate(Constants.XPathRooster, doc, XPathConstants.NODESET);

        int counteruur = 1;
        int counterdag = 1;
        int decreaser  = 0;
        Boolean isPauze = false;
        RoosterItems RItems = new RoosterItems();
        RoosterItem  RItem  = null;
        for (int i = 0; i < nl.getLength(); i++){

            if ((counteruur == pauze[0]) || (counteruur == pauze[1]) || (counteruur == pauze[2])) {
                isPauze = true;
                decreaser++;
            }

            if (!isPauze) {
                RItem = new RoosterItem();
                switch (counterdag){
                case 1:
                    RItem.setDag("ma");
                    break;
                case 2:
                    RItem.setDag("di");
                    break;
                case 3:
                    RItem.setDag("wo");
                    break;
                case 4:
                    RItem.setDag("do");
                    break;
                case 5:
                    RItem.setDag("vr");
                    break;
                }

                Node n = nl.item(i);
                String content = n.getTextContent();
                if (content.length() > 1) {
                    RItem.setUur(""+(counteruur-decreaser));
                    NodeList t1 = n.getChildNodes();
                    NodeList t2 = t1.item(0).getChildNodes();
                    NodeList t3 = t2.item(0).getChildNodes();
                    for (int j = 0; j < t3.getLength(); j++) {
                        Node temp = t3.item(j);
                        if (t3.getLength() == 3) {
                            switch (j) {
                            case 0:
                                RItem.setLes(""+temp.getTextContent());
                                break;
                            case 1:
                                RItem.setLokaal(""+temp.getTextContent());
                                break;
                            case 2:
                                RItem.setDocent(""+temp.getTextContent());
                                break;
                            default:
                                return false;
                            }
                        } else if (t3.getLength() == 4) {
                            switch (j) {
                            case 0:
                                break;
                            case 1:
                                RItem.setLes("tts. " + temp.getTextContent());
                                break;
                            case 2:
                                RItem.setLokaal(""+temp.getTextContent());
                                break;
                            case 3:
                                RItem.setDocent(""+temp.getTextContent());
                                break;
                            default:
                                return false;
                            }
                        } else if (t3.getLength() == 1) {
                            RItem.setLes(""+temp.getTextContent());
                        } else {
                            return false;
                        }
                    }
                } else {
                    RItem.setUur("" + (counteruur-decreaser));
                    RItem.setLokaal("Vrij");
                }
                RItems.add(RItem);
            }
            if (counteruur == counteruurmax) { counteruur = 0; counterdag++; decreaser = 0;}
            counteruur++;
            isPauze = false;
        }

        if (RItems.size() > 0) {
            mSQL = new RoosterSQLAdapter(mContext);
            mSQL.openToWrite();
            mSQL.deleteAll();
            for (int j = 0; j < RItems.size(); j++) {
                RoosterItem insert = RItems.get(j);
                mSQL.insert(insert.getDag(), insert.getUur(), insert.getLes(), insert.getLokaal(), insert.getDocent());
            }
            if (mSQL != null) mSQL.close();
        }
        return true;
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
        return false;
    } catch (XPathExpressionException e) {
        e.printStackTrace();
        return false;
    }

There are a few constants but I think you can guess them yourself;) and otherwise you know how to ask me for them:)

The RoosterItem class will hold all variables of an hour, and the RoosterItems will hold more than one RoosterItem

Good Luck!

2 Comments

Sorry for not adding XPath, here it is: "/html/body/table[1]/tbody/tr/td" Note that this will only work for the API's that use XPath
Thanks, I already found the answer. But because of the nice code example above I will mark it as the best answer. (Yes, so you got the points ;) )
1

So far i think JSoup is one of the best way to extract or manipulate the HTML.....

See this link :

http://jsoup.org/

But somehow.... this did't worked in my case, so i converted the entire HTML code into String, then parsed it.....

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.