This is a complete edit of the question because I must have asked my question poorly based on the answers - so I will try to be more clear.
I have an object that I am trying to scrape. In my code used on my laptop I have no problems getting this to work. When I transfered over to Pythonanywhere I no longer could get the information I am looking for.
The code that works on my system is:
from urllib.request import urlopen
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import csv
import time
import re
#68 lines of code for another section of the site above this working well on my system and on pythonanywhere.
pageSource = driver.page_source
bsObj = BeautifulSoup(pageSource)
try:
parcel_number = bsObj.find(id="mParcelnumbersitusaddress_mParcelNumber")
s_parcel_number =parcel_number.get_text()
except AttributeError as e:
s_parcel_number = "Parcel Number not found"
# same kind of code (all working) that gets 10 more pieces of data
# Tax Year
try:
pause = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "TaxesBalancePaymentCalculator")))
taxes_owed_2015_yr = bsObj.findAll(id="mGrid_RealDataGrid")[1].findAll('tr')[1].findAll('td')[0]
except IndexError as e:
s_taxes_owed_2015_yr = "No taxes due"
This code works just fine on my laptop with fireforx - on Pythonanywhere if i print the pagesource for the page I am trying to scrape I get the following where my table should be:
<table border="0" cellpadding="5" cellspacing="0" class="WithBorder" width="100%">
<tbody><tr>
<td id="TaxesBalancePaymentCalculator"><!--DONT_PRINT_START-->
<span class="InputFieldTitle" id="mTabGroup_Taxes_mTaxChargesBalancePaymentInjected_mReportProcessingNote">Please wait while your current taxes are calculated.</span><img src="images/progress.gif"/> <!--DONT_PRINT_FINISH--></td>
</tr> <!--DONT_PRINT_START-->
<script type="text/javascript">
function TaxesBalancePaymentCalculator_ScriptLoaded( pPageContent )
{
element('TaxesBalancePaymentCalculator').innerHTML = pPageContent;
}
function results_ready()
{
element('pay_button_area').style.display = 'block';
element('pay_button_area2').style.display = 'block';
element('pay_additional_things_area').style.display = 'block';
}
var no_taxes_calculator = '&nbsp;<' + 'span class="MessageTitle">The tax balance calculator is not availab
le.<' + '/span>';
function no_taxes_calculator_available()
{
element('TaxesBalancePaymentCalculator').innerHTML = no_taxes_calculator;
}
function invalid()
{
element('TaxesBalancePaymentCalculator').innerHTML = no_taxes_calculator;
}
loadScript( 'injected/TaxesBalancePaymentCalculator.aspx?parcel_number=15-720-01-01-00-0-00-000' );
</script><script id="injected_taxesbalancepaymentcalculator_ScriptTag" type="text/javascript"></script>
<tr id="pay_button_area" style="DISPLAY: none">
<td id="pay_button_area2">
<table border="0" cellpadding="2" cellspacing="0">
<tbody><tr>
I have played around and have found that if I get the innerHTML (as a str):
element('TaxesBalancePaymentCalculator').innerHTML = pPageContent;
that section holds my data - problem is I can not preform a findAll on a string and I need certain rows from the table:
taxes_owed_2015_yr = bsObj.findAll(id="mGrid_RealDataGrid")[1].findAll('tr')[1].findAll('td')[0]
I need help on how to get that element as an object (not a string) so that I can use it in my data. I have tried so many thing that I could not list them all here. I really could use some help please.
Thanks in advance.
findAllmethods inPython. This isbs4method... Do importbs4within your code? What you are trying to do withbsObj?bs4module works in a little bit different way... You should read about it some more crummy.com/software/BeautifulSoup/bs4/doc