I'm trying to parse financial information directly from the SEC and have a question about parsing XML documents using VBA in excel. I understand that it's possible to work through the document referencing child nodes and item numbers, but the document is huge and it would take forever to read through and identify each item I need.
I saw online that it's common to use XPATH to more efficiently query XML documents. I've tried many approaches but I've had no success so far. I believe my issue is understanding which namespace the elements are in and how to properly reference the elements under a specified namespace.
Below is a portion of my code trying to reference an arbitrary element,
Sub SecData()
Dim xml_obj As MSXML2.XMLHTTP60
Dim xDoc As New MSXML2.DOMDocument60
Dim xml_url As String
Dim nodes As Variant
Set xml_obj = New MSXML2.XMLHTTP60
xml_url = "https://www.sec.gov/Archives/edgar/data/320193/000032019321000010/aapl-20201226_htm.xml"
xml_obj.Open bstrMethod:="GET", bstrURL:=xml_url
xml_obj.send
xDoc.LoadXML (xml_obj.responseText)
xDoc.SetProperty "SelectionLanguage", "XPath"
xDoc.SetProperty "SelectionNamespaces", "xmlns:link='http://www.xbrl.org/2003/linkbase'"
nodes = xDoc.SelectNodes("//RevenueFromContractWithCustomerExcludingAssessedTax")
The XML document is too large to include in the question so I'll leave a link below,
https://www.sec.gov/Archives/edgar/data/320193/000032019321000010/aapl-20201226_htm.xml
Any help would be greatly appreciated!
Thanks
RevenueFromContractWithCustomerExcludingAssessedTax, I get 28 results in that page. Are you after 28 results?xDoc.getElementsByTagName("us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax").