1

I try to read get a nested mongodb result into a pandas dataframe.

The data looks like this.

{
"_id" : ObjectId("5911b9cebb56c016794d45a4"),
"crawlat" : "2017-05-09 14:45",
"traffic" : [ 
    {
        "timestamp" : "1494338401",
        "organic" : 53
    }, 
    {
        "timestamp" : "1494342001",
        "organic" : 64
    }, 
    {
        "timestamp" : "1494345601",
        "organic" : 74
    }, 
    {
        "timestamp" : "1494349201",
        "organic" : 78
    }, 
    {
        "timestamp" : "1494352801",
        "organic" : 80
    }, 
    {
        "timestamp" : "1494356401",
        "organic" : 88
    }, 
    {
        "timestamp" : "1494360001",
        "organic" : 91
    }, 
    {
        "timestamp" : "1494363601",
        "organic" : 92
    }, 
    {
        "timestamp" : "1494367201",
        "organic" : 94
    }
]

}

The traffic array contains 48 entries for every result.

Im just interested in the values of "organic" ordered in the order of the array.

I start with

con = pymongo.MongoClient(['...:27017'])
collsitemap = con.sitemap.newssitemap
sitemapsdata = collsitemap.find({'traffic':{'$size':48}})

I did some cleanup used json_normalize and

dfsitemap = dfsitemap['traffic'].apply(pd.Series)

Now the result look like this

enter image description here

But i need a table with just the organic values. How can i clean this up?

1
  • What are the two dimensions in your data frame? Commented May 19, 2017 at 9:42

1 Answer 1

1

You could create your data frame with the from_records constructor, which allows you to specify columns to include or exclude:

pd.DataFrame.from_records(sitemapsdata['traffic'], exclude=['timestamp'])

which gives:

image

Sign up to request clarification or add additional context in comments.

3 Comments

OK thanks i get an error: TypeError: Argument 'rows' has incorrect type (expected list, got Series) Traffic is an array as you can see above.
Sorry that's a variable typo. The argument to from_records is a list of dicts (in your case the value of traffic). I've updated the answer.
Now i get: TypeError: index 'traffic' cannot be applied to Cursor instances The find will return multiple rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.