I've been working on a python 3 script to generate BibTeX entries, and have ISSN's that I would like to use to get information regarding the associated Journal.
For instance, I would like to take the ISSN 0897-4756 and find that this is Chemistry of Materials journal, which is published by ACS Publications.
I can do this manually using this site, where the info that I am looking for is stored in the lxml table //table[@id="journal-search-results-table"], or more specifically, in the cells of the table body thereof.
I have, however, not been able to get this to automate successfully using python 3.x
I have attempted to access the data using approaches from the httplib2, requests, urllib2, and lxml.html packages, with no success thusfar.
What I have so far is shown below:
import certifi
import lxml.html
import urllib.request
ISSN = "0897-4756"
address = "https://www.journalguide.com/journals/search?type=journal-name&journal-name={}".format(ISSN)
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
request = urllib.request.Request(address,None,hdr) #The assembled request
response = urllib.request.urlopen(request)
html = response.read()
tree = lxml.html.fromstring(html)
print(tree.xpath('//table[@id="journal-search-results-table"]/text()'))
# >> ['\n', '\n']
# Shows that I am connecting to the table
print(tree.xpath('//table[@id="journal-search-results-table"]//td/text()'))
# >> []
# Should???? hold the data segments that I am looking for?
Exact page being queryed by the above
From what I can tell, it would appear that the table's tbody element, and thus the tr and td elements that it contains are not being loaded at the time that python is interpretting the HTML - which is accordingly preventing me from reading the data.
How do I make it so that I can read out the Journal Name and Publisher from the specified table above?