0

I wanted to find a way to parse information from:

<tr>
   <td class="prodSpecAtribute">Rulebook Chapter</td>
   <td colspan="5">
     <a href="http://cmegroup.com/rulebook/CME/V/450/452/452.pdf" target="_blank" title="CME Chapter 452">CME Chapter 452</a>
   </td>
</tr>

<tr>
   <td class="prodSpecAtribute" rowspan="2">
      Trading Hours
      <br>
      (All times listed are Central Time)
   </td>
   <td>OPEN OUTCRY</td>
   <td colspan="4">
      <div class="font_black Large_div_td">MON-FRI: 7:20 a.m. - 2:00 p.m.</div>
   </td>
</tr>
<tr>
   <td>CME GLOBEX</td>  #PROBLEM HERER -- WANT this and  div below to be one row, considered under class <td class="prodSpecAtribute" rowspan="2"> ... Trading Hours... 

   <td colspan="4">
      <div class="font_black Large_div_td">SUN - FRI: 5:00 p.m. - 4:00 p.m. CT</div>
   </td>
</tr>

I was able to parse information in the top table easily as follows:

soup = BeautifulSoup(page)
left_col = soup.findAll('td', attrs={'class' : 'prodSpecAtribute'})
right_col= soup.findAll('td', colspan=['4', '5'])

So in this example there are 3 rows: 2 have class "prodSpecAtribute" and atleast one column corresponding to each class. However, the last row, has no class, so I need a way to use the last class and define this new under the same class, along with the 2 of the third row's <td>s: CME GLOBEX and SUN - FRI: 5:00 p.m. - 4:00 p.m. CT

Combine_column method:

def combine_col(right):
    num = len(right)

    for i in range(0, num):
        text_ = ' '.join(right[i].findAll(text=True))
        print text_

    return text_

1 Answer 1

1

The obvious way to merge the second and third columns of the second row is to iterate over the rows explicitly. Anything you write with find_all is just going to return row0-col1, row1-col1, and row1-col2 as three separate values, and you'll have no way of knowing which ones go together.

So, if I understand your problem, you want something like this:

left_col = []
right_col = []
for tr in soup.find_all('tr'):
    tds = tr.find_all('td')
    left, right = tds[0], tds[1:]
    assert('prodSpecAtribute' in left['class'])
    left_col.append(left)
    right_col.append(combine_columns(right))

Except that you need to write that combine_columns code, because I don't know how you want to "combine the information" in the columns.

I'm obviously using the rule that column 0 goes in the left, rather than whatever column has class prodSpecAttribute. I did this mainly because I can't figure out what you'd want to happen for a row that had no such column, or where it wasn't the leftmost column. So, I just added an assert for sanity checking, to verify that this is always the right rule for your source.

Sign up to request clarification or add additional context in comments.

3 Comments

Sorry, what I meant by combine was to append the texts the found in tds[0] tds[1] ...
Could you take a look at the post again, I forgot to add the last <tr> which is difficult to figure out
You really need to give a much less vague description if you want more exact answers. Or, better, instead of describing what you want, just show the exact output you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.