2

I have stucked with how to parse these data in the form of key value pair.Please guide me

<div class="content">
    <div class="label">Company Name: </div>
    Cartell Chemical Co., Ltd.
    <br/>
    <div class="label">Business Owner: </div>
    Michael Chen
    <br/>
    <div class="label">Employees: </div>
    210
    <br/>
    <div class="label">Main markets: </div>
    North America, Europe, China, South Asia
    <br/>
    <div class="label">Business Type: </div>
    Manufacturer
    <br/>
</div>

I need output in these format.please guide me using Java with Jsoup library

Company Name:Cartell Chemical Co., Ltd.
Business Owner:Michael Chen
Employees:210
Main markets:North America, Europe, China, South Asia
Business Type:Manufacturer
0

2 Answers 2

4

Have a look at the documentation.

Here's a working example:

public class StackOverflow20973268 {
    private static String input = "<div class=\"content\">" +
            "<div class=\"label\">Company Name: </div>" +
            "Cartell Chemical Co., Ltd." +
            "<br/>" +
            "<div class=\"label\">Business Owner: </div>" +
            "Michael Chen" +
            "<br/>" +
            "<div class=\"label\">Employees: </div>" +
            "210" +
            "<br/>" +
            "<div class=\"label\">Main markets: </div>" +
            "North America, Europe, China, South Asia" +
            "<br/>" +
            "<div class=\"label\">Business Type: </div>" +
            "Manufacturer" +
            "<br/>" +
            "</div>";

    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.parse(input);
        Elements labels = doc.select("div.content div.label");
        for (Element label : labels) {
            System.out.println(String.format("%s:%s", label.text().trim(),
                    label.nextSibling().outerHtml()));
        }
    }
}

Output:

Company Name::Cartell Chemical Co., Ltd.
Business Owner::Michael Chen
Employees::210
Main markets::North America, Europe, China, South Asia
Business Type::Manufacturer
Sign up to request clarification or add additional context in comments.

3 Comments

@Adam:pls see this link.. gmdu.net/corp-902113.html..When i extract about us,category region,tools..it shows like thisWebsite::<span class="url">www.hayleys.com</span> Address::<span class="adr street-address">No.25,Foster Lane,Colombo 10,</span> Telephone::<span class="tel">0094779123026</span>... i need text alone pls help
@Adam:this link am extracting .. When i need to extract Address, Telephone, website. It showing like this with your code..pls help Website::<span class="url">www.hayleys.com</span> Address::<span class="adr street-address">No.25,Foster Lane,Colombo 10,</span> Telephone::<span class="tel">0094779123026</span>
Maybe better add another question?
-1

The Jsoup library is very good for parsing html. It allows extracting values by class/id name or by tree dom traversal. You basically get a div element and find its children which could be text nodes (containing the text to be parsed) or another element which will have its own children. Example you could do something like (not tested with some pseudo)

    doc = Jsoup.parse(info);
        Elements divs= doc.body().getElementsByTag("div");
    for (Element divElement: divs) {
        //extract text of div element with div.textNodes()
        //then 
        //div.nextNode() or something like that 
    }

Basically finding elements and stepping either into them for text or to the next/previous one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.