Scraping data from multiple html tables within one website in python

Question

I am trying to get a timeseries from this website into python: http://www.boerse-frankfurt.de/en/etfs/db+x+trackers+msci+world+information+technology+trn+index+ucits+etf+LU0540980496/price+turnover+history/historical+data#page=1

I've gotten pretty far, but don't know how to get all the data and not just the first 50 rows which you can see on the page. To view them online, you have to click through the results at the bottom of the table. I would like to be able to specify a start and end date in python and get all the corresponding dates and prices in a list. Here is what I have so far:

 from bs4 import BeautifulSoup
 import requests
 import lxml
 import re

 url = 'http://www.boerse-frankfurt.de/en/etfs/db+x+trackers+msci+world+information+technology+trn+index+ucits+etf+LU0540980496/price+turnover+history/historical+data'
 soup = BeautifulSoup(requests.get(url).text)

 dates  = soup.findAll('td', class_='column-date')
 dates  = [re.sub('[\\nt\s]','',d.string) for d in dates]
 prices = soup.findAll('td', class_='column-price')
 prices = [re.sub('[\\nt\s]','',p.string) for p in prices]

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

You need to loop through the rest of the pages. You can use POST request to do that. The server expects to receive a structure in each POST request. The structure is defined below in values. The page number is the parameter 'page' of that structure. The structure has several parameters I have not tested but that could be interesting to try, like items_per_page, max_time and min_time. Here below is an example code:

from bs4 import BeautifulSoup
import urllib
import urllib2
import re

url = 'http://www.boerse-frankfurt.de/en/parts/boxes/history/_histdata_full.m'
values = {'COMPONENT_ID':'PREeb7da7a4f4654f818494b6189b755e76', 
    'ag':'103708549', 
    'boerse_id': '12',
    'include_url': '/parts/boxes/history/_histdata_full.m',
    'item_count': '96',
    'items_per_page': '50',
    'lang': 'en',
    'link_id': '',
    'max_time': '2014-09-20',
    'min_time': '2014-05-09',
    'page': 1,
    'page_size': '50',
    'pages_total': '2',
    'secu': '103708549',
    'template': '0',
    'titel': '',
    'title': '',
    'title_link': '',
    'use_external_secu': '1'}

dates = []
prices = []
while True:
    data = urllib.urlencode(values)
    request = urllib.urlopen(url, data)
    soup = BeautifulSoup(request.read())
    temp_dates  = soup.findAll('td', class_='column-date')
    temp_dates  = [re.sub('[\\nt\s]','',d.string) for d in temp_dates]
    temp_prices = soup.findAll('td', class_='column-price')
    temp_prices = [re.sub('[\\nt\s]','',p.string) for p in temp_prices]
    if not temp_prices:
        break
    else:
        dates = dates + temp_dates
        prices = prices + temp_prices
        values['page'] += 1

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Sep 21, 2014 at 15:23

Jose Varez

2,0771 gold badge14 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

phildeutsch Over a year ago

Thanks a lot, this looks like exactly what I'm looking for. Two questions though: Do you know how to get this to work in python3? I've used

data = urllib.parse.urlencode(values)  request = urllib.request.urlopen(url, data.encode('ascii')) soup = BeautifulSoup(request.read())

but that doesn't work (I am getting the same dates and prices over and over and the loop never terminates). Also, how did you come up with the values dict in the first place?

Jose Varez Over a year ago

You can find examples of POST requests using Python 3 and urllib here. I think you need to create a Request object first:

data = urllib.parse.urlencode(values) request = urllib.request.Request(url, data) response = urllib.request.urlopen(request) soup = BeautifulSoup(response.read())

. I extracted the dict values using FireBug, a Firefox extension that lets you see the contents of the HTTP requests in your browser.

Collectives™ on Stack Overflow

Scraping data from multiple html tables within one website in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related