2
<td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >
  <div class="inner">
    <div class="item">
  <div class="view-item view-item-aisd_calendar">
  <div class="calendar monthview">
        <div class="calendar.4168.field_date.8.0 contents">
                      <a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a>                      <span class="date-display-single">7:00 pm</span>          </div>  
        <div class="cutoff">&nbsp;</div>
      </div> 
  </div>   
</div>  </div>
</td>

I have the above HTML code. I would like to extract the "date" tag (2014-04-28) and "a href" tag (Regular Board Meeting) from the above. How can I do this using Python? Can this be done using Beautiful Soup?

1
  • Yes, this can easily be done with BeautifulSoup. I strongly suggest reading the documentation here Commented Mar 24, 2014 at 10:45

1 Answer 1

2

Here's how you can do it via BeautifulSoup:

from bs4 import BeautifulSoup


data = """
<html>
    <body>
        <td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >
          <div class="inner">
            <div class="item">
          <div class="view-item view-item-aisd_calendar">
          <div class="calendar monthview">
                <div class="calendar.4168.field_date.8.0 contents">
                              <a href="/event/2013/regular-board-meeting">**Regular Board Meeting**</a>                      <span class="date-display-single">7:00 pm</span>          </div>
                <div class="cutoff">&nbsp;</div>
              </div>
          </div>
        </div>  </div>
        </td>
    </body>
</html>
"""
soup = BeautifulSoup(data)

td = soup.body.td  # or soup.find('td', id='aisd_calendar-2014-04-28-0')
print td['date'].strip('*')

link = soup.find('div', {'class': 'contents'}).a
print link['href']

prints:

2014-04-28
/event/2013/regular-board-meeting

Also, if you need to convert the date into python's datetime, you can use strptime():

from datetime import datetime

...

datetime.strptime(td['date'].strip('*'), '%Y-%m-%d')

Hope that helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.