0

I am trying to use richtext to display html content so i was parse the url try to get all content inside <div class="margin-box"></div> to String value. But i can not parse the url. Code like below:

User Soup parse the url

Document document = Jsoup.parse(news_url);
String news_content = CommonUtil.newsContent(document);

Data Capture

    public static String newsContent(Document document){
        Elements elements = document.select("div.margin-box");
        String newsContent = elements.toString();
        return newsContent;
    }

Then i get debug result: enter image description here

Show URL parse unsuccessful. Actually i want to get value like below:

<div>
<p>
<imgsrc="http://p1.pstatp.com/large/1c67000332373537f0ff" img_width="640" img_height="360" inline="0" alt=“************” onerror="javascript:errorimg.call(this);">
</p>
<p class="pgc-img-caption”>***********</p><p>*************************************</p>
<p><imgsrc="http://p3.pstatp.com/large/1c6e0000841ab42ca326" img_width="640" img_height="425" inline="0" alt=“**********”onerror="javascript:errorimg.call(this);"></p>
<p class="pgc-img-caption”>********************************</p>
<p><img src="http://p1.pstatp.com/large/1c6d00008eebccce3e2f" img_width="550" img_height="375" inline="0" alt=“************” onerror="javascript:errorimg.call(this);"></p>
<p class="pgc-img-caption”>*********</p><p>**************************</p><p>*********************</p><p>*****************</p></div>

What did i do wrong?

Full HTML BLOCK enter image description here

There are no element inside div class enter image description here

5
  • Can you post the web URL or complete html block? Commented Jul 8, 2017 at 7:03
  • OK i will edit and add full html block Commented Jul 8, 2017 at 7:05
  • @ProkashSarkari have add full html block Commented Jul 8, 2017 at 7:11
  • Your debug point is set to "document" which has the full html block instead of a div class. You already have the data just put a logcat and print the value of "newsContent" Commented Jul 8, 2017 at 8:48
  • @ProkashSarkar But there are no elements inside div class look at my new edit Commented Jul 8, 2017 at 9:40

1 Answer 1

1

It is useful to first check, if JSoup can parse the content: http://try.jsoup.org/~8W0oCmiiYnFL01nUM6HDbQ9wwTA

You are using Jsoup.parse which expects html stored in a string. If you want to use parse to retrieve the html source you have to pass a URL and a timeout:

String url = "http://servertrj.com/news/index/208";
Document doc = Jsoup.parse(new URL(url), 3000);

Most of the time you find the get() syntax to pull html source, compare your syntax to this simple example:

String url = "http://servertrj.com/news/index/208";
String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36";
Document doc = Jsoup.connect(url).userAgent(userAgent).get();
Elements elements = doc.select(".margin-box");
System.out.println(elements.size() + "\n" + elements.toString());

Output:

1
<div class="margin-box"> 
<p style="margin: 0px 0px 15px; padding: 0px; border: 0px; line-height: 30px; font-family: &quot;Microsoft YaHei;, SimSun, Verdana, Arial; color: rgb(0, 0, 0); font-size: 15px;">[... truncated because of spam detection, but same as try.jsoup]</p> 
</div>
Sign up to request clarification or add additional context in comments.

1 Comment

sorry for comment you delay. your suggest is what i want It is work thank you very much!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.