0

is it possible with beautiful soup to scrape the "BOND INDEXES" table in the following page rather than the easier task of scraping the default "CDS INDEXES" table?:

https://web.apps.markit.com/

or is this a task where selenium is necessary?

I don't know how to research on this problem as I do not know how to call the Changing Table option

3
  • There is no JavaScript, tables or even text in this web page are you sure you have the right URL? Commented Aug 3, 2017 at 18:28
  • sure about the link, can you see the different TAB above the table? I am not sure that Javascript is used to switch between the two different tables though Commented Aug 3, 2017 at 22:13
  • wsj.com/mdc/public/npage/2_3023_creditdervs.html Commented Aug 4, 2017 at 8:49

2 Answers 2

1

You need a cookie from the iframe loaded on the main page. You can get that by creating a requests session and ask for the main page, then the iframe app. That way you have the cookies and url needed to access the final leg of the road.

The following fetches the header and each row of the table:

import requests
from bs4 import BeautifulSoup
import json

with requests.Session() as sess:
    # Get the data:
    response = sess.get('http://www.wsj.com/mdc/public/npage/2_3023_creditdervs.html')
    sess.get(BeautifulSoup(response.text, 'lxml').find('iframe').attrs['src'])
    response = sess.post(
        'https://web.apps.markit.com/AppsApi/GetIndexData',
        data={'indexOrBond': 'bond', 'ClientCode': 'WSJ'}
    )
    table = BeautifulSoup(json.loads(response.text)['html'], 'lxml').find('table', {'id': 'BondIndexTable'})
    header = [cell.text for cell in table.find('thead').find_all('tr')[-1].find_all('th')]
    data = list()
    for row in table.find_all('tr'):
        row = [cell.text for cell in row.find_all('td')]
        if len(row) > 2:
            data.append(row)

    # Do something with the data:
    print(header)
    for row in data:
        print(row)

This produces:

['Bond Indexes', 'Daily', 'Monthly', 'YTD', '1Y', '3Y']
['Markit iBoxx USD Overall', '0.18%', '0.41%', '3.45%', '-0.60%', '8.98%']
['Markit iBoxx USD Treasuries', '0.20%', '0.46%', '2.60%', '-2.26%', '7.56%']
['Markit iBoxx USD Liquid Investment Grade Index', '0.17%', '0.47%', '5.62%', '1.54%', '14.07%']
['Markit iBoxx USD Liquid High Yield Index', '-0.07%', '0.00%', '5.55%', '9.85%', '14.23%']
['Markit iBoxx EUR Overall', '0.13%', '0.50%', '0.15%', '-2.25%', '8.67%']
['Markit iBoxx EUR Corporates', '0.09%', '0.40%', '1.79%', '0.67%', '8.79%']
['Markit iBoxx EUR Sovereigns', '0.15%', '0.60%', '-0.24%', '-3.29%', '9.68%']
['Markit iBoxx GBP Overall', '0.68%', '0.66%', '2.00%', '-0.80%', '23.18%']
['Markit iBoxx GBP Corporates', '0.55%', '0.57%', '4.13%', '2.95%', '24.88%']
['Markit iBoxx GBP Gilts', '0.74%', '0.72%', '1.36%', '-2.03%', '23.39%']
['Markit iBoxx Asia', '0.00%', '-0.02%', '2.23%', '-1.00%', '8.27%']
['Markit iBoxx Global Inflation-Linked Index All USD', '0.58%', '0.71%', '-0.45%', '-0.63%', '9.81%']
['Markit iBoxx GEMX USD', '0.06%', '0.10%', '2.70%', '1.44%', '6.92%']
['Markit iBoxx USD Corporates', '0.16%', '0.39%', '4.96%', '1.92%', '11.88%']

This can be used with pandas or some other tool for data manipulation:

import pandas as pd
df = pd.DataFrame(data, columns=header)
for col in df.columns:
    if col != 'Bond Indexes':
        df[col] = pd.to_numeric(df[col].replace(regex=True, to_replace='%', value=''))/100
print(df)
Sign up to request clarification or add additional context in comments.

Comments

1

You need to get a unique URL from the first page. Then access this URL to get cookies. Then you can make a POST request to get the HTML you want in a json object. Like this:

import requests
from bs4 import BeautifulSoup
import json

s = requests.Session() 
t = s.get('http://www.wsj.com/mdc/public/npage/2_3023_creditdervs.html')
soup1 = BeautifulSoup(t.text, "html.parser")
# Get the unique URL.
url = soup1.find('iframe').get("src")
# Make a request to set cookies.
s.get(url)
# I'm not sure all these headers are needed but some are.
headers = {'X-Requested-With': 'XMLHttpRequest',
           'Accept': 'application/json, text/javascript, */*; q=0.01',
           'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0',
           'Content-Type': 'application/x-www-form-urlencoded',
           'referer': url}
data = {"ClientCode": "WSJ","indexOrBond": "bond"}
# Get the data as json.
r = s.post('https://web.apps.markit.com/AppsApi/GetIndexData',  data = data, headers = headers)
jn = r.json()
# Get the HTML from the json.
print (jn['html'])

From there you could load the HTML into BeautifulSoup if you wanted to parse it further.

Outputs:

<Div><div class="tableDesc"><div class="firstComponentNote">Green indicates rising index levels; red indicates declining index levels.</div></div><table id="BondIndexTable" class="dataTable"><thead><tr><th class="col1"></th><th class="col23456 topCell centerAlign" colspan="5" style="text-align: center;">Total Return</th></tr><tr><th class="col1 leftCell">Bond Indexes</th><th class="col2">Daily</th><th class="col3">Monthly</th><th class="col4">YTD</th><th class="col5">1Y</th><th class="col6">3Y</th></tr></thead><tbody><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx USD Overall</div></td><td class="col2 yellowBack"><span class="pos">0.18%</span></td><td class="col3 yellowBack"><span class="pos">0.41%</span></td><td class="col4 yellowBack"><span class="pos">3.45%</span></td><td class="col5 yellowBack"><span class="neg">-0.60%</span></td><td class="col6 yellowBack"><span class="pos">8.98%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx USD Treasuries</div></td><td class="col2 yellowBack"><span class="pos">0.20%</span></td><td class="col3 yellowBack"><span class="pos">0.46%</span></td><td class="col4 yellowBack"><span class="pos">2.60%</span></td><td class="col5 yellowBack"><span class="neg">-2.26%</span></td><td class="col6 yellowBack"><span class="pos">7.56%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx USD Liquid Investment Grade Index</div></td><td class="col2 yellowBack"><span class="pos">0.17%</span></td><td class="col3 yellowBack"><span class="pos">0.47%</span></td><td class="col4 yellowBack"><span class="pos">5.62%</span></td><td class="col5 yellowBack"><span class="pos">1.54%</span></td><td class="col6 yellowBack"><span class="pos">14.07%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx USD Liquid High Yield Index</div></td><td class="col2 yellowBack"><span class="neg">-0.07%</span></td><td class="col3 yellowBack"><span class="pos">0.00%</span></td><td class="col4 yellowBack"><span class="pos">5.55%</span></td><td class="col5 yellowBack"><span class="pos">9.85%</span></td><td class="col6 yellowBack"><span class="pos">14.23%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx EUR Overall</div></td><td class="col2 yellowBack"><span class="pos">0.13%</span></td><td class="col3 yellowBack"><span class="pos">0.50%</span></td><td class="col4 yellowBack"><span class="pos">0.15%</span></td><td class="col5 yellowBack"><span class="neg">-2.25%</span></td><td class="col6 yellowBack"><span class="pos">8.67%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx EUR Corporates</div></td><td class="col2 yellowBack"><span class="pos">0.09%</span></td><td class="col3 yellowBack"><span class="pos">0.40%</span></td><td class="col4 yellowBack"><span class="pos">1.79%</span></td><td class="col5 yellowBack"><span class="pos">0.67%</span></td><td class="col6 yellowBack"><span class="pos">8.79%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx EUR Sovereigns</div></td><td class="col2 yellowBack"><span class="pos">0.15%</span></td><td class="col3 yellowBack"><span class="pos">0.60%</span></td><td class="col4 yellowBack"><span class="neg">-0.24%</span></td><td class="col5 yellowBack"><span class="neg">-3.29%</span></td><td class="col6 yellowBack"><span class="pos">9.68%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx GBP Overall</div></td><td class="col2 yellowBack"><span class="pos">0.68%</span></td><td class="col3 yellowBack"><span class="pos">0.66%</span></td><td class="col4 yellowBack"><span class="pos">2.00%</span></td><td class="col5 yellowBack"><span class="neg">-0.80%</span></td><td class="col6 yellowBack"><span class="pos">23.18%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx GBP Corporates</div></td><td class="col2 yellowBack"><span class="pos">0.55%</span></td><td class="col3 yellowBack"><span class="pos">0.57%</span></td><td class="col4 yellowBack"><span class="pos">4.13%</span></td><td class="col5 yellowBack"><span class="pos">2.95%</span></td><td class="col6 yellowBack"><span class="pos">24.88%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx GBP Gilts</div></td><td class="col2 yellowBack"><span class="pos">0.74%</span></td><td class="col3 yellowBack"><span class="pos">0.72%</span></td><td class="col4 yellowBack"><span class="pos">1.36%</span></td><td class="col5 yellowBack"><span class="neg">-2.03%</span></td><td class="col6 yellowBack"><span class="pos">23.39%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx Asia</div></td><td class="col2 yellowBack"><span class="neg">0.00%</span></td><td class="col3 yellowBack"><span class="neg">-0.02%</span></td><td class="col4 yellowBack"><span class="pos">2.23%</span></td><td class="col5 yellowBack"><span class="neg">-1.00%</span></td><td class="col6 yellowBack"><span class="pos">8.27%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx Global Inflation-Linked Index All USD</div></td><td class="col2 yellowBack"><span class="pos">0.58%</span></td><td class="col3 yellowBack"><span class="pos">0.71%</span></td><td class="col4 yellowBack"><span class="neg">-0.45%</span></td><td class="col5 yellowBack"><span class="neg">-0.63%</span></td><td class="col6 yellowBack"><span class="pos">9.81%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx GEMX USD</div></td><td class="col2 yellowBack"><span class="pos">0.06%</span></td><td class="col3 yellowBack"><span class="pos">0.10%</span></td><td class="col4 yellowBack"><span class="pos">2.70%</span></td><td class="col5 yellowBack"><span class="pos">1.44%</span></td><td class="col6 yellowBack"><span class="pos">6.92%</span></td></tr><tr><td class="col1 leftCell"><div class="qname">Markit iBoxx USD Corporates</div></td><td class="col2 yellowBack"><span class="pos">0.16%</span></td><td class="col3 yellowBack"><span class="pos">0.39%</span></td><td class="col4 yellowBack"><span class="pos">4.96%</span></td><td class="col5 yellowBack"><span class="pos">1.92%</span></td><td class="col6 yellowBack"><span class="pos">11.88%</span></td></tr></tbody><tfoot><tr><td colspan="6">Markit iBoxx indexes track the performance of the global sovereign- and corporate-bond markets. These benchmark indexes are calculated using prices contributed by multiple financial institutions. The indexes are owned, calculated and administered by Markit. For more information visit <a href="http://indices.markit.com" target="_blank">indices.markit.com</a></td></tr></tfoot></table></Div>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.