5

I'm making a mini twitter clone in Flask + MongoDB (w/ pymongo) as a learning exercise and I need some help joining data from two collections. I know and understand joins can't be done in MongoDB, that's why I'm asking how do it in Python.

I have a collection to store user information. Documents look like this:

{
    "_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
    "email" : "[email protected]",
    "message" : [
        ObjectId("51a6c5e1eedc89e34ee46e36")
    ],
    "pw_hash" : "alexhash",
    "username" : "alex",
    "who_id" : [
        ObjectId("51a6c530eedc89e34ee46e33"),
        ObjectId("51a6c54beedc89e34ee46e34")
    ],
    "whom_id" : [ ]
}

and another collection to store messages (tweets):

{
    "_id" : ObjectId("51a6c5e1eedc89e34ee46e36"),
    "author_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
    "text" : "alex first twit",
    "pub_date" : ISODate("2013-05-30T03:22:09.462Z")
}

As you can see, the message contains a reference to the user's "_id" in "author_id" in the message document and vice versa for the message's "_id" in "message" in the user document.

Basically, what I want to do is take every message's "author_id", get the corresponding username from the user collection and make a new dictionary containing the "username" + "text" + "pub_date". With that, I could easily render the data in my Jinja2 template.

I have the following code that sorta do what I want:

def getMessageAuthor():
    author_id = []
    # get a list of author_ids for every message
    for author in coll_message.find():
        author_id.append(author['author_id'])
    # iterate through every author_ids to find the corresponding username
    for item in author_id:
        message = coll_message.find_one({"author_id": item}, {"text": 1, "pub_date": 1})
        author = coll_user.find_one({"_id": item}, {"username": 1})
        merged = dict(chain((message.items() + author.items())))

Output looks this:

{u'username': u'alex', u'text': u'alex first twit', u'_id': ObjectId('51a6c4e3eedc89e34ee46e32'), u'pub_date': datetime.datetime(2013, 5, 30, 3, 22, 9, 462000)}

Which is exactly what I want.

The code doesn't work though because I'm doing .find_one() so I always get the first message even if a user has two or more. Doing .find() might resolve this issue, but .find() returns a cursor and not a dictionary like .find_one(). I haven't figured out how to convert cursors to the same dictionary format as the output from .find_one() and merge them to get the same output as above.

This is where I'm stuck. I don't know how I should proceed to fix this. Any help is appreciated.

Thank you.

2
  • Have you considered embedding the user name in the message? It makes reads easier, and you might not allow the user to change his name anyway. Even if you do, you can update all documents on name change. Commented May 31, 2013 at 5:48
  • Yes, I did. I actually tried several different schemas before settling on this one. One of the reasons I settled on this one was that if this was more than an a learning exercise, I would want users to be able to changed their username so I figured might as well learn how to do it now. Also, I choose this because ObjectIds are quite small (only 12 bytes) compared to storing strings. More info about that here: stackoverflow.com/a/7767394/1234135 Cheers! Commented May 31, 2013 at 15:39

1 Answer 1

4

Append ("_id", "author_id") so that this id is used to retrive the corresponding message as expected and author_id to get username.

You just need unique key to do that :

def getMessageAuthor():
    author_id = []
    # get a list of ids and author_ids for every message
    for author in coll_message.find():
        author_id.append( (author['_id'], author['author_id']))
    # iterate through every author_ids to find the corresponding username
    for id, item in author_id:
        message = coll_message.find_one({"_id": id}, {"text": 1, "pub_date": 1})
        author = coll_user.find_one({"_id": item}, {"username": 1})
        merged = dict(chain((message.items() + author.items())))
Sign up to request clarification or add additional context in comments.

1 Comment

Wow! I confirm this works. I can't believe the fix was so simple. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.