0

I have a very large csv that I'm trying to search through. I've decided to use MongoDB and load in the whole csv so I can quickly search through it later with python and pymongo, instead of having to load all 80MB each time I try a search. What I can't figure out, is how to search through the collection for a given user agent and match it to the regex that is in the item's '_id' field. I first process the browscap entry to a python rege, then insert the item into the mongo collection.


How do I search with pymongo a given user agent for a regex mongoID?


Example User Agent:

AppleCoreMedia/1.0.0.12B440 (iPad; U; CPU OS 8_1_2 like Mac OS X; en_us)

Example Browscap Entry:

AppleCoreMedia/1.0* (iPhone*CPU OS 8* like Mac OS X*)*

Example Regex Mongo ID:

^AppleCoreMedia\/1\\.0.\*?\\ \\(iPhone.*?CPU\\ OS\\ 8.*?\\ like\\ Mac\\ OS\\ X.*?\\).*?$

1
  • What are you saying here? Is the "regex" itself the string stored in the _id? Or is just the user agent string stored in the _id and you want to search with a regex? For the former case you need the $where evalution as mentioned in MongoDB reverse regex. Thought it's notably not a good performer, and you might want to rethink what you are doing here if that is the case. Commented Feb 16, 2016 at 23:44

1 Answer 1

0

User Agent would be a poor choice for the _id field due to low entropy.

Aside from that, the fact that _id gets indexed should result in mongoDB storing most/all of the data in memory anyway.

Sign up to request clarification or add additional context in comments.

1 Comment

But the regex'd id is what is indexed. I don't know what the actual User Agent is until I have a search to do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.