2

There are some similar topics on Stack Overflow but I can't find any which actually explain how to do it.

I have an API which returns data as XML. I want to convert this to JSON as I'm storing it in MongoDB.

<cj-api>
    <products total-matched="231746" records-returned="999" page-number="1">
        <product>
            <ad-id>10648829</ad-id>
            <advertiser-id>2618386</advertiser-id>
            <advertiser-name>Acne Jeans UK</advertiser-name>
            <advertiser-category>New Arrivals</advertiser-category>
            <buy-url>http://www.tkqlhce.com/click-7227532-10648829?url=http%3A%2F%2Fshop.acnestudios.com%2Fpop-cord-white.html&cjsku=30X133-151</buy-url>
            <catalog-id>cjo:939</catalog-id>
            <currency>EUR</currency>
            <description>Pop Cord White.</description>
            <image-url>http://c893323.r23.cf3.rackcdn.com/catalog/product/cache/25/thumbnail/300x300/9df78eab33525d08d6e5fb8d27136e95/3/0/30X133-151_A_57419.jpg</image-url>
            <in-stock>true</in-stock>
            <isbn/>
            <manufacturer-name>Acne Studios</manufacturer-name>
            <manufacturer-sku/>
            <name>Pop Cord White</name>
            <price>200.0</price>
            <retail-price/>
            <sale-price/>
            <sku>30X133-151</sku>
            <upc/>
        </product>
        <product>
            <ad-id>10648829</ad-id>
            <advertiser-id>2618386</advertiser-id>
            <advertiser-name>Acne Jeans UK</advertiser-name>
            <advertiser-category>Jeans</advertiser-category>
            <buy-url>http://www.dpbolvw.net/click-7227532-10648829?url=http%3A%2F%2Fshop.acnestudios.com%2Fflex-black.html&cjsku=30H126-129</buy-url>
            <catalog-id>cjo:939</catalog-id>
            <currency>EUR</currency>
            <description>Acne Flex Black jeans are narrow, tight fitting jeans with a comfortable mid rise waist.<ul>
                <li>Worn in black blue wash</li>
                <li>Classic five pocket styling</li>
                <li>Zippered fly front closure</li>
                <li>Secures at the waist with bachelor button</li>
                <li>Acne embossed rivets</li>
                </ul>.</description>
                <image-url>http://c893323.r23.cf3.rackcdn.com/catalog/product/cache/25/thumbnail/300x300/9df78eab33525d08d6e5fb8d27136e95/3/0/30H126-129_A_18.jpg</image-url>
                <in-stock>true</in-stock>
                <isbn/>
                <manufacturer-name>Acne Studios</manufacturer-name>
                <manufacturer-sku/>
                <name>Flex Black</name>
                <price>170.0</price>
                <retail-price/>
                <sale-price/>
                <sku>30H126-129</sku>
                <upc/>
            </product>     
    </products>
</cj-api>

(let's pretend there are only 2 products - in actuality there are 231,746!)

In Node, I'm using request to gather this XML and store it in a variable, called body. I'm using libxmljs, like so:

    var xmlDoc = libxmljs.parseXmlString(body);
    var product = xmlDoc.get('//product'); <!-- very unsure whether I'm using this correctly

The issue is that I have absolutely no idea how to work with this data, and the libxmljs docs are really unhelpful in this regard. I'd like to do something equivalent to jQuery's:

$('product').each(function(){
    var obj = {
        advertiser-name: $(this).find('advertiser-name'),
        buy-url: $(this).find('buy-url'),
        ... etc etc etc
})

How can I do this using libxmljs / another library so that I can work with the data more easily?

1 Answer 1

4

About using libxmljs:

  • You need to use xmlDoc.find('//product') instead of get, because get returns only first element for specific XPath (//product is a XPath). You can find all other methods for document object on wiki

  • Method find returns array of elements, so equivalent to your jQuery example will be:

    var xmlDoc = libxmljs.parseXmlString(xml);
    
    var products = xmlDoc.find('//product');
    for (var index = 0; index < products.length; index++) {
      var obj = {
        "advertiser-name": products[index].get('advertiser-name').text()
      };
    
      // ...
    }
    

    In this example for each element I use method get (because I know that only one child has this name) to get child element and after this just ask text value of this element.

At second I'd like to recommend you xml2js module (github), which can convert xml to JSON object. For this you just need to use parseString function with two parameters xml string and callback function (err, result) { ... }, where result will be JSON representation of this xml.

Because you also mentioned that each xml can have > 200,000 products - you need to be aware about performance of these libraries. It seems to me that both of them will load whole XML in memory. If you need to execute this script just once or maybe once a day (or hour) - I guess it can be fine to use it as is. If you will need to improve performance you should take a look on SaxPushParser interface from libxmljs library.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.