2

I have a bunch of reports from VirusTotal and thought to myself: "in order to create the statistics I need, why not put the data into a MongoDB and simply query it. Can't be too hard, now, can it?"

Well, it can. Here's the basic data format.

data format

I'm mostly interested in the scans array. Unfortunately the scanner name is a key of an object and since I'm by no means even a MongoDB novice, I have no clue how to approach this. Hell, I don't even know how to search on Google.

What I'd like to do:

  • Get a count of how many scanners have detected:true (and false), grouped by the name of the scanner. For example something like this (for the true search):

    Bkav: 20000
    TotalDefense: 19238
    BitDefender: 39132
    ...
    
  • Another interesting bit would involve the result field. It contains the name of the malware and I'd like to create a statistic how many scanners use the same malware-family name for a specific file and for the whole collection.

I'd really appreciate some examples or pointers. I'm on the verge of writing a little python script that scans all the JSON files and does what I need instead of using MongoDB.

1
  • Try to post your sample collection and the output... Images don't work here Commented Aug 5, 2018 at 14:58

1 Answer 1

2

To get from the objects to arrays, you can use $objectToArray (Mongo 3.6 and newer):

db.getCollection('collection').aggregate([
    {$project: {scans: {$objectToArray: '$scans'}}},   // object -> array
    {$unwind: '$scans'},                               // array -> multiple docs
    {$match: {'scans.v.detected': true /*or false*/}}, // filter
    {$group: {_id: '$scans.k', count: {$sum: 1}}}      // group
])

It will result in something like this:

[{
    "_id" : "TotalDefense",
    "count" : 1.0
},
{
    "_id" : "Bkav",
    "count" : 3.0
}]

As for the second question: $group works also with objects, so you can group by {scanner: '$scans.k', result: '$scans.v.result'} for example.

Sign up to request clarification or add additional context in comments.

3 Comments

That did help a lot, thank you. Follow-up question: Why is it $scans.k in the $group section and $scans.v in $match. It's the k and the v that make me wonder.
Just look into the $objectToArray documentation. Basically, it makes an array of {k, v} objects, where k are the keys and v - values. We want to group by the scanner, so the key of this object, but filter by one of the values.
Ah, gotcha. Now it makes sense. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.