A website changes content dynamically, through the use of two date filters (year / week), without the need of a get request (it is handled asynchronously on the client side). Each filter option produces a different page_source with td elements I would like to extract.
Currently, I am using a nested list for-loop to iterate through the filters (and so different page sources containing different td elements, iterate through the contents of each page source and then append the desired td elements in an empty list.
store = []
def getData():
year = ['2015','2014']
for y in year:
values = y
yearid = Select(browser.find_element_by_id('yearid'))
fsid.select_by_value(values)
weeks = ['1', '2']
for w in weeks:
value = w
frange = Select(browser.find_element_by_id('frange'))
frange.select_by_value('WEEKS')
selectElement = Select(browser.find_element_by_id('fweek'))
selectElement.select_by_value(value)
pressFilter = browser.find_element_by_name('submit')
pressFilter.submit()
#scrape data from page source
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
for el in soup.find_all('td'):
store.append(el.get_text())
So far so good, and I have a for loop that constructs a single list of all the td elements that I would like.
Instead, I would like to store separate lists, one for each page source (i.e. one per filter combination), in a list of lists. I can do that after the fact i.e. in a secondary step I could then extract the items from the list according to some criteria.
However, can I do that at the point of the original appending? Something like...
store = [[],[], [], []]
...
counter = 0
for el in soup.find_all('td'):
store[counter].append(el.get_text())
counter = counter +1
This isn't quite right as it only appends to the first object in the store list. If I put the counter in the td for-loop, then it will increase for each time td element is iterated, when in actual fact I only want it to increase when I have finished iterating through a particular page source ( which is itself an iteration of a filter combination).
I am stumped, is what I am trying even possible? If so, where should I put the counter? Or should I use some other technique?
tdtext results to that new list:perfilter = [],store.append(perfilter), and in thefind_all()loop:perfilter.append(el.get_text()).