I am trying to use Python with BeautifulSoup to go through a page that has sections with ids that are incrementing in value by 1, and I am trying to get their vids. However the # of vids are variable depending on the span id as you can see below, also it is not nested under the original tr.
Right now I am doing a loop to get the span id value, however I am trying to figure out a way to get the vid values as an array for each span id.
The following is an example html I am working with:
<tr>
<td>
<div>
<span class="apple-font" id="001">
</div>
</td>
</tr>
<tr>
</tr>
<tr>
<td>
<a vid="0099882"></a>
</td>
</tr>
<tr>
<td>
<a vid="0099883"></a>
</td>
</tr>
<tr>
<td>
<a vid="0099883"></a>
</td>
</tr>
<tr>
<td>
<div>
<span class="apple-font" id="002">
</div>
</td>
</tr>
<tr>
</tr>
<tr>
<td>
<a vid="0099883"></a>
</td>
</tr>
<tr>
<td>
<div>
<span class="apple-font" id="003">
</div>
</td>
</tr>
<tr>
</tr>
<tr>
<td>
<a vid="0099883"></a>
</td>
</tr>
<tr>
<td>
<a vid="0099883"></a>
</td>
</tr>
<tr>
<td>
<div>
<span class="apple-font" id="004">
</div>
</td>
</tr>
<tr>
</tr>
The following is code I am using / have been trying to but have not made much progress yet on figuring out getting all the vids:
soup = soup.findAll(class_="apple-font", id=True)
for s in soup:
n = str(s.get_text().lstrip().replace(".",""))
print n
print