1

I loosely know how to parse html tables in jsoup, but the table that I'm working with is somewhere in the webpage and I don't know how to get to it: https://finance.yahoo.com/calendar/earnings?symbol=nflx

It's the one with the earnings dates.

I know that you have to do

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx").get();

Then in a loop:

for (Element table : doc.select("some string") {

how do I get the needed string for the table?

1 Answer 1

3

You don't actually need to traverse all the code with for (Element table : doc.select("some string") { you can get the table direct from the code.

To be able to get the table you will need first to inspect the code using the Developer Tools of your favorite browser (assuming that you are using one that has). Like this:

enter image description here

And identify the element you want to get, in your case the specific table is:

<table class="data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)" data-reactid="4">

The code to get to it is:

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx")
                    .timeout(600000) //added timeout because my internet sucks
                    .get();
Elements tableDiv = doc.getElementsByAttributeValue("class", "data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)");

Then you have an org.jsoup.select.Elements collection where you can parse in the same way, getting the elements from inside the table using the methods getElementsBy[whateverAreAvailable]

Here is an example how you can print only that table:

tableDiv.forEach(tbody -> tbody.getElementsByTag("tbody")
                               .forEach(tr -> System.out.println(tr)));

Use your favorite IDE to find out which methods to use. I think that this is enough to you figure out where to go.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! However, I am getting html code as my output: <tbody data-reactid="31"> <tr class="data-rowNFLX Bgc($extraLightBlue):h BdT Bdc($tableBorderGray) Bdc($tableBorderBlue):h H(33px)" data-reactid="32"> <td class="Pstart(6px)" data-reactid="33"><label class="Ta(c) Pos(r)" data-reactid="34"><input type="checkbox" class="Pos(a) V(h)" data-reactid="35"> and so on and so on... What I want are the actual values in the table (actually I really just need the first date)
Yes of course. You have to parse it more. I didn't give you the code to do so, instead I showed the way so you can figure it out yourself. From the body you get the TRs and from the TRs you get the TDs and its values. The reason I didn't give you the exact code is because this is a site to learn/teach we are not a free code service. It is there in the answer, use the getElementsBy.... methods on and on and you will get to the values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.