2

I am pulling JSON data from Salesforce. I can have roughly about 10 000 records, but never more. In order to prevent Api limits and having to hit Salesforce for every request, I thought I could query the data every hour and then store it in memory. Obviously this will be much much quicker, and much less error prone.

A JSON object would have about 10 properties and maybe one other nested JSON object with two or three properties.

I am using methods similar to below to query the records.

getUniqueProperty: function (data, property) {
    return _.chain(data)
        .sortBy(function(item) { return item[property]; })
        .pluck(property)
        .uniq()
        .value();
}

My questions are

  • What would the ramifications be by storing the data into memory and working with the data in memory? I obviously don't want to block the sever by running heavy filtering on the data.

  • I have never used redis before, but would something like a caching db help?

  • Would it be best to maybe query the data every hour, and store the JSON response in something like Mongo. I would then do all my querying against Mongo as opposed to in-memory? Every hour I query Salesforce, I just flush the database and reinsert the data.

2
  • Assuming that your salesforce data is being updated during that hour, all your requests will be out of date until the next update. Commented Feb 8, 2014 at 16:55
  • Not at all worried about the data being out of date. It can be out of date for the timeframe. It's probably only going to be updated and need to be pulled through every couple of hours anyway. Commented Feb 8, 2014 at 17:19

1 Answer 1

1

Storing your data in memory has a couple of disadvantages:

  • non-scalable — when you decide to use more processes, each process will need to make same api request;
  • fragile — if your process crashes you will lose the data.

Also working with large amount of data can block process for longer time than you would like.

Solution: - use external storage! It can be redis, or MongoDB or RDBMS; - update data in separate process, triggered with cron; - don't drop the whole database: there is a chance that someone will make a request right after that (if your storage doesn't support transactions, of course), update records.

Sign up to request clarification or add additional context in comments.

2 Comments

I have briefly looked at redis. Is it not impossible to do rich querying on the data since it is a key value store? So for example, I won't be able to query the JSON data where, lets say, vehicleMake is Toyota? I thought about updating records, but thats where things get quite complex. I just need the data relevant to the application since all the data is stored on Salesforce anyway. If I lose the data, I just query Salesforce to get the relevant data again and work with that. Can I not spawn a child process for complex querying?
@TyroneMichael if you need complex queries MongoDB or RDBMS is a good choice. If you will spawn a child for each query, then you'll have to deal with overhead of passing your data via IPC every time or requesting it from salesforce. If you will have a demon query process, it will be basically reinventing DBMS.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.