0

Suppose, I have an XML with descriptions of files and directories (where XML nodes of files that belong to some directory are nested into that directory's corresponding XML node):

<list>
<signature>
hash...
</signature>
  <file id="44">
   <data>
    <length>1759</length>
    <offset>36491175</offset>
    <size>1018</size>
    <encoding style="application/x-gzip"/>
    <extracted-checksum style="sha1">91e4e9a81d6b8abb07fd6009afe72178997c512b</extracted-checksum>
    <archived-checksum style="sha1">f141cb115e21de3320c1afccd2dc115b78f83661</archived-checksum>
   </data>
   <type>file</type>
   <name>dist.dat</name>
  </file>
  <file id="6">
   <ctime>1970-01-01T00:00:00Z</ctime>
   <mtime>1970-01-01T00:00:00Z</mtime>
   <atime>1970-01-01T00:00:00Z</atime>
   <group>wheel</group>
   <gid>0</gid>
   <user>root</user>
   <uid>0</uid>
   <mode>0700</mode>
   <deviceno>0</deviceno>
   <inode>0</inode>
   <type>directory</type>
   <name>src</name>
   <file id="7">
    <ctime>1970-01-01T00:00:00Z</ctime>
    <mtime>1970-01-01T00:00:00Z</mtime>
    <atime>1970-01-01T00:00:00Z</atime>
    <group>wheel</group>
    <gid>0</gid>
    <user>root</user>
    <uid>0</uid>
    <mode>0700</mode>
    <deviceno>0</deviceno>
    <inode>0</inode>
    <type>directory</type>
    <name>Dede</name>          // ! Notice no closing </file> tag yet because next file is inside this directory
    <file id="8">
     <data>
      <length>514</length>
      <offset>36357051</offset>
      <size>859</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">b13b86984f1ceeb698e879ad4a0c4174804529c3</extracted-checksum>
      <archived-checksum style="sha1">414b16e24dbbbbf864d68bcde726d52d69a4dc04</archived-checksum>
     </data>
     <type>file</type>
     <name>Hello.txt</name>
    </file>
    <file id="9">
     <data>
      <length>776</length>
      <offset>36357665</offset>
      <size>1630</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">318eec584b12f2333133d8d07bf6b2d883fa7070</extracted-checksum>
      <archived-checksum style="sha1">c1d196e28e7a8ec0444f7d16f205bd67667f5eec</archived-checksum>
     </data>
     <type>file</type>
     <name>Local_st.txt</name>
    </file>
   </file>                         // ! Closing </file> tag for the 'Dede' directory
   ... More File Nodes
   </list>

What would be the best way to collect the offsets (or any desired properties for that matter) from all file nodes that have <type>file</type> and disregard "file" nodes that have <type>directory</type>?

Is there a native way or a library to do it? Or, if I manage to convert this XML into a JASON first, how would I do it then? This has to work completely browser-side.

Thanks

1 Answer 1

1

Using XPath e.g. //file[type = 'file']/data/offset with either browser APIs like the DOM Level 3 XPath API evaluate method on document nodes

const xml = `<list>
<signature>
hash...
</signature>
  <file id="44">
   <data>
    <length>1759</length>
    <offset>36491175</offset>
    <size>1018</size>
    <encoding style="application/x-gzip"/>
    <extracted-checksum style="sha1">91e4e9a81d6b8abb07fd6009afe72178997c512b</extracted-checksum>
    <archived-checksum style="sha1">f141cb115e21de3320c1afccd2dc115b78f83661</archived-checksum>
   </data>
   <type>file</type>
   <name>dist.dat</name>
  </file>
  <file id="6">
   <ctime>1970-01-01T00:00:00Z</ctime>
   <mtime>1970-01-01T00:00:00Z</mtime>
   <atime>1970-01-01T00:00:00Z</atime>
   <group>wheel</group>
   <gid>0</gid>
   <user>root</user>
   <uid>0</uid>
   <mode>0700</mode>
   <deviceno>0</deviceno>
   <inode>0</inode>
   <type>directory</type>
   <name>src</name>
   <file id="7">
    <ctime>1970-01-01T00:00:00Z</ctime>
    <mtime>1970-01-01T00:00:00Z</mtime>
    <atime>1970-01-01T00:00:00Z</atime>
    <group>wheel</group>
    <gid>0</gid>
    <user>root</user>
    <uid>0</uid>
    <mode>0700</mode>
    <deviceno>0</deviceno>
    <inode>0</inode>
    <type>directory</type>
    <name>Dede</name>          
    <file id="8">
     <data>
      <length>514</length>
      <offset>36357051</offset>
      <size>859</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">b13b86984f1ceeb698e879ad4a0c4174804529c3</extracted-checksum>
      <archived-checksum style="sha1">414b16e24dbbbbf864d68bcde726d52d69a4dc04</archived-checksum>
     </data>
     <type>file</type>
     <name>Hello.txt</name>
    </file>
    <file id="9">
     <data>
      <length>776</length>
      <offset>36357665</offset>
      <size>1630</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">318eec584b12f2333133d8d07bf6b2d883fa7070</extracted-checksum>
      <archived-checksum style="sha1">c1d196e28e7a8ec0444f7d16f205bd67667f5eec</archived-checksum>
     </data>
     <type>file</type>
     <name>Local_st.txt</name>
    </file>
   </file>                         
   ... More File Nodes
  </file>
</list>`;
   
const domParser = new DOMParser();

const xmlDoc = domParser.parseFromString(xml, 'application/xml');

const result = xmlDoc.evaluate('//file[type = "file"]/data/offset', xmlDoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

console.log(result.snapshotLength);

const offsets = [];
for (let i = 0; i < result.snapshotLength; i++) {
  offsets.push(result.snapshotItem(i).textContent);
}

console.log(offsets);

//alternative

const offsetNumbers = [];
for (let i = 0; i < result.snapshotLength; i++) {
  offsetNumbers.push(xmlDoc.evaluate('number()', result.snapshotItem(i), null, XPathResult.NUMBER_TYPE, null).numberValue);
}

console.log(offsetNumbers);

or using Saxon-JS 2 as a library you can use //file[type = 'file']/data/offset/number():

const xml = `<list>
<signature>
hash...
</signature>
  <file id="44">
   <data>
    <length>1759</length>
    <offset>36491175</offset>
    <size>1018</size>
    <encoding style="application/x-gzip"/>
    <extracted-checksum style="sha1">91e4e9a81d6b8abb07fd6009afe72178997c512b</extracted-checksum>
    <archived-checksum style="sha1">f141cb115e21de3320c1afccd2dc115b78f83661</archived-checksum>
   </data>
   <type>file</type>
   <name>dist.dat</name>
  </file>
  <file id="6">
   <ctime>1970-01-01T00:00:00Z</ctime>
   <mtime>1970-01-01T00:00:00Z</mtime>
   <atime>1970-01-01T00:00:00Z</atime>
   <group>wheel</group>
   <gid>0</gid>
   <user>root</user>
   <uid>0</uid>
   <mode>0700</mode>
   <deviceno>0</deviceno>
   <inode>0</inode>
   <type>directory</type>
   <name>src</name>
   <file id="7">
    <ctime>1970-01-01T00:00:00Z</ctime>
    <mtime>1970-01-01T00:00:00Z</mtime>
    <atime>1970-01-01T00:00:00Z</atime>
    <group>wheel</group>
    <gid>0</gid>
    <user>root</user>
    <uid>0</uid>
    <mode>0700</mode>
    <deviceno>0</deviceno>
    <inode>0</inode>
    <type>directory</type>
    <name>Dede</name>          
    <file id="8">
     <data>
      <length>514</length>
      <offset>36357051</offset>
      <size>859</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">b13b86984f1ceeb698e879ad4a0c4174804529c3</extracted-checksum>
      <archived-checksum style="sha1">414b16e24dbbbbf864d68bcde726d52d69a4dc04</archived-checksum>
     </data>
     <type>file</type>
     <name>Hello.txt</name>
    </file>
    <file id="9">
     <data>
      <length>776</length>
      <offset>36357665</offset>
      <size>1630</size>
      <encoding style="application/x-gzip"/>
      <extracted-checksum style="sha1">318eec584b12f2333133d8d07bf6b2d883fa7070</extracted-checksum>
      <archived-checksum style="sha1">c1d196e28e7a8ec0444f7d16f205bd67667f5eec</archived-checksum>
     </data>
     <type>file</type>
     <name>Local_st.txt</name>
    </file>
   </file>                         
   ... More File Nodes
  </file>
</list>`;
   
const offsets = SaxonJS.XPath.evaluate(`parse-xml($xml)//file[type = 'file']/data/offset/number()`, [], { params : { xml : xml }});

console.log(offsets);
<script src="https://www.saxonica.com/saxon-js/documentation/SaxonJS/SaxonJS2.rt.js"></script>

Sign up to request clarification or add additional context in comments.

9 Comments

How do I access individual offset values? I read the docs for XPathResult which you are using, and it's supposed to have XPathResult.stringValue property, but it's giving me an error when I am trying to call console.log(result.stringValue);
In XPath 1 you can select all the offset elements, then you need to use the DOM APIs to read e.g. the textContent of each element node or, if you want to stick with XPath, to read out number() with an evaluate for each offset element as the context node. XPath 1 has no way to give you a list of numbers or strings, XPath 2 or 3 is much more powerful and expressive and gives you all values as a sequence or array of numbers or strings.
"XPath 2 or 3 is much more powerful and expressive and gives you all values as a sequence or array of numbers or strings." How would I do that with XPath 2 or 3? Is there any reason I would have to stick with Xpath 1?
I.e., How do I get an array of offsets? Thank you!
The original answer already had the code for XPath 3 using Saxon-JS 2 to get an array of numeric offset values, the edit I did to the XPath 1 version now has the JavaScript code to convert the XPathResult into an array of primitive values.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.