0

A website changes content dynamically, through the use of two date filters (year / week), without the need of a get request (it is handled asynchronously on the client side). Each filter option produces a different page_source with td elements I would like to extract.

Currently, I am using a nested list for-loop to iterate through the filters (and so different page sources containing different td elements, iterate through the contents of each page source and then append the desired td elements in an empty list.

store = []

def getData():
    year = ['2015','2014']


    for y in year:
        values = y
        yearid = Select(browser.find_element_by_id('yearid'))
        fsid.select_by_value(values)


        weeks = ['1', '2']
        for w in weeks:
            value = w           
            frange = Select(browser.find_element_by_id('frange'))
            frange.select_by_value('WEEKS')
            selectElement = Select(browser.find_element_by_id('fweek'))
            selectElement.select_by_value(value)
            pressFilter = browser.find_element_by_name('submit')
            pressFilter.submit()

            #scrape data from page source 

            html = browser.page_source

            soup = BeautifulSoup(html, "lxml")


            for el in soup.find_all('td'):
                store.append(el.get_text())

So far so good, and I have a for loop that constructs a single list of all the td elements that I would like.

Instead, I would like to store separate lists, one for each page source (i.e. one per filter combination), in a list of lists. I can do that after the fact i.e. in a secondary step I could then extract the items from the list according to some criteria.

However, can I do that at the point of the original appending? Something like...

store = [[],[], [], []]

...

   counter = 0
   for el in soup.find_all('td'):
      store[counter].append(el.get_text())
   counter = counter +1 

This isn't quite right as it only appends to the first object in the store list. If I put the counter in the td for-loop, then it will increase for each time td element is iterated, when in actual fact I only want it to increase when I have finished iterating through a particular page source ( which is itself an iteration of a filter combination).

I am stumped, is what I am trying even possible? If so, where should I put the counter? Or should I use some other technique?

2
  • 1
    I don't see any list comprehensions in your question. For your current code (using regular loops), just create and append a new list object per filter combination, and append all the td text results to that new list: perfilter = [], store.append(perfilter), and in the find_all() loop: perfilter.append(el.get_text()). Commented Sep 16, 2016 at 15:49
  • I have corrected reference to list comprehension Commented Sep 16, 2016 at 15:53

1 Answer 1

1

Create a new list object per filter combination, so inside the for w in weeks: loop. Append your cell text to that list, and append the per-filter list this produces to store:

def getData():
    store = []
    year = ['2015','2014']

    for y in year:
        # ... elided for brevity    

        weeks = ['1', '2']
        for w in weeks:
            perfilter = []
            store.append(perfilter)

            # ... elided for brevity    

            for el in soup.find_all('td'):
                perfilter.append(el.get_text())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.