Stuck with nested for loop issue

Question

A website changes content dynamically, through the use of two date filters (year / week), without the need of a get request (it is handled asynchronously on the client side). Each filter option produces a different page_source with td elements I would like to extract.

Currently, I am using a nested list for-loop to iterate through the filters (and so different page sources containing different td elements, iterate through the contents of each page source and then append the desired td elements in an empty list.

store = []

def getData():
    year = ['2015','2014']


    for y in year:
        values = y
        yearid = Select(browser.find_element_by_id('yearid'))
        fsid.select_by_value(values)


        weeks = ['1', '2']
        for w in weeks:
            value = w           
            frange = Select(browser.find_element_by_id('frange'))
            frange.select_by_value('WEEKS')
            selectElement = Select(browser.find_element_by_id('fweek'))
            selectElement.select_by_value(value)
            pressFilter = browser.find_element_by_name('submit')
            pressFilter.submit()

            #scrape data from page source 

            html = browser.page_source

            soup = BeautifulSoup(html, "lxml")


            for el in soup.find_all('td'):
                store.append(el.get_text())

So far so good, and I have a for loop that constructs a single list of all the td elements that I would like.

Instead, I would like to store separate lists, one for each page source (i.e. one per filter combination), in a list of lists. I can do that after the fact i.e. in a secondary step I could then extract the items from the list according to some criteria.

However, can I do that at the point of the original appending? Something like...

store = [[],[], [], []]

...

   counter = 0
   for el in soup.find_all('td'):
      store[counter].append(el.get_text())
   counter = counter +1

This isn't quite right as it only appends to the first object in the store list. If I put the counter in the td for-loop, then it will increase for each time td element is iterated, when in actual fact I only want it to increase when I have finished iterating through a particular page source ( which is itself an iteration of a filter combination).

I am stumped, is what I am trying even possible? If so, where should I put the counter? Or should I use some other technique?

I don't see any list comprehensions in your question. For your current code (using regular loops), just create and append a new list object per filter combination, and append all the td text results to that new list: perfilter = [], store.append(perfilter), and in the find_all() loop: perfilter.append(el.get_text()). — Martijn Pieters
– Martijn Pieters, Commented Sep 16, 2016 at 15:49

Martijn Pieters · Accepted Answer · 2016-09-16 15:56:59Z

1

Create a new list object per filter combination, so inside the for w in weeks: loop. Append your cell text to that list, and append the per-filter list this produces to store:

def getData():
    store = []
    year = ['2015','2014']

    for y in year:
        # ... elided for brevity    

        weeks = ['1', '2']
        for w in weeks:
            perfilter = []
            store.append(perfilter)

            # ... elided for brevity    

            for el in soup.find_all('td'):
                perfilter.append(el.get_text())

answered Sep 16, 2016 at 15:56

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Stuck with nested for loop issue

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related