0

I'm trying to parse financial information directly from the SEC and have a question about parsing XML documents using VBA in excel. I understand that it's possible to work through the document referencing child nodes and item numbers, but the document is huge and it would take forever to read through and identify each item I need.

I saw online that it's common to use XPATH to more efficiently query XML documents. I've tried many approaches but I've had no success so far. I believe my issue is understanding which namespace the elements are in and how to properly reference the elements under a specified namespace.

Below is a portion of my code trying to reference an arbitrary element,

Sub SecData()

Dim xml_obj As MSXML2.XMLHTTP60
Dim xDoc As New MSXML2.DOMDocument60
Dim xml_url As String
Dim nodes As Variant

Set xml_obj = New MSXML2.XMLHTTP60

xml_url = "https://www.sec.gov/Archives/edgar/data/320193/000032019321000010/aapl-20201226_htm.xml"
xml_obj.Open bstrMethod:="GET", bstrURL:=xml_url
xml_obj.send

xDoc.LoadXML (xml_obj.responseText)

xDoc.SetProperty "SelectionLanguage", "XPath"
xDoc.SetProperty "SelectionNamespaces", "xmlns:link='http://www.xbrl.org/2003/linkbase'"

nodes = xDoc.SelectNodes("//RevenueFromContractWithCustomerExcludingAssessedTax")

The XML document is too large to include in the question so I'll leave a link below,

https://www.sec.gov/Archives/edgar/data/320193/000032019321000010/aapl-20201226_htm.xml

Any help would be greatly appreciated!

Thanks

3
  • If I search for it RevenueFromContractWithCustomerExcludingAssessedTax, I get 28 results in that page. Are you after 28 results? Commented Feb 20, 2021 at 19:23
  • To be honest, I haven't verified exactly which values I'm looking for yet. I'm looking to figure out the process to reference any specific element, and then apply that process once I figure out what I need. So to horribly answer your question, I might need the 28 results... Commented Feb 20, 2021 at 19:25
  • 1
    This works xDoc.getElementsByTagName("us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax"). Commented Feb 20, 2021 at 19:28

1 Answer 1

2

XML: do child nodes inherit parent's namespace prefix? covers namespace inheritance.

If a namespace has no prefix such as this one:

 xmlns="http://www.xbrl.org/2003/instance"

then it's inherited by anything below it.

If there's a prefix (here "xbrldi")

 xmlns:xbrldi="http://xbrl.org/2006/xbrldi"

then it's only inherited when explicitly used in an element such as:

<xbrldi:explicitMember dimension="us-gaap:StatementClassOfStockAxis">

The element in your query has "us-gaap" as a namespace alias, so you need to add it to the namespaces collection, and include the alias in your xpath:

us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax

For example:

Dim xDoc As New MSXML2.DOMDocument60
Dim nodes As Variant

xDoc.resolveExternals = False  'don't try to resolve external resources
xDoc.validateOnParse = True

xDoc.Load "C:\Temp\aapl-20201226_htm.xml"  'using local copy for testing...

Debug.Print xDoc.parseError.reason 'in case of problems

xDoc.SetProperty "SelectionLanguage", "XPath"

'add namespaces: the first one is the default namespace with a "dummy" prefix of "xxx"
xDoc.SetProperty "SelectionNamespaces", _
                "xmlns:xxx='http://www.xbrl.org/2003/instance' " & _
                "xmlns:link='http://www.xbrl.org/2003/linkbase' " & _
                "xmlns:us-gaap='http://fasb.org/us-gaap/2020-01-31'"
                '+ other namespaces as needed...

'element with no prefix: using the "dummy" `xxx` prefix we added for the default namespace
Set nodes = xDoc.SelectNodes("//xxx:context")
Debug.Print nodes.Length  ' 207

'these elements belong to a specific namespace so use that prefix...
Set nodes = xDoc.SelectNodes("//us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax")
Debug.Print nodes.Length  ' 28
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your quick and thorough answer. The link was also helpful. As you can tell, XML isn't my strong point. I hate asking to be spoonfed an answer but how would I add the "us-gaap" namespace alias to the collection?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.