I have a bunch of reports from VirusTotal and thought to myself: "in order to create the statistics I need, why not put the data into a MongoDB and simply query it. Can't be too hard, now, can it?"
Well, it can. Here's the basic data format.
I'm mostly interested in the scans array. Unfortunately the scanner name is a key of an object and since I'm by no means even a MongoDB novice, I have no clue how to approach this. Hell, I don't even know how to search on Google.
What I'd like to do:
Get a count of how many scanners have
detected:true(andfalse), grouped by the name of the scanner. For example something like this (for thetruesearch):Bkav: 20000 TotalDefense: 19238 BitDefender: 39132 ...Another interesting bit would involve the
resultfield. It contains the name of the malware and I'd like to create a statistic how many scanners use the same malware-family name for a specific file and for the whole collection.
I'd really appreciate some examples or pointers. I'm on the verge of writing a little python script that scans all the JSON files and does what I need instead of using MongoDB.
