5

I have 4.5 millions of records in my Dynamo Db.

I want to read the the id of each record as a batchwise.

i am expecting something like offset and limit like how we can read in Mongo Db.

Is there any way suggestions without scan method in Node-JS.

I have done enough research i can only find scan method which buffers the complete records from Dynamo Db and the it starts scanning the records, which is not effective in performance basis.

Please do give me suggestion.

4
  • Try this: docs.aws.amazon.com/amazondynamodb/latest/APIReference/… Commented Feb 15, 2018 at 7:44
  • If you give the schema for your table, we can maybe figure out a Query instead of a Scan. Commented Feb 19, 2018 at 17:34
  • And since you want the entire table, Scan is a valid option. The only reason Scan is performance wise bad is that it scans the whole table which is something what we want here. Commented Feb 19, 2018 at 17:36
  • You can also use AWS DataPipeline to transfer the data to S3 or RedShift Commented Feb 19, 2018 at 17:37

2 Answers 2

3

From my point of view, there's no problem doing scans because (according to the Scan doc):

  • DynamoDB paginates the results from Scan operations

  • You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them

The default size for pages is 1MB, but you can also specify the max number of items per page with the Limit parameter.

So it's just basic pagination, the same thing MongoDB does with offset and limit.

Here is an example from the docs of how to perform Scan with the node.js SDK.

Now, if you want to get all the IDs as a batchwise, you could wrap the whole thing with a Promise and resolve when there's no LastEvaluatedKey.

Below a pseudo-code of what you could do :

const performScan = () => new Promise((resolve, reject) => {
    const docClient = new AWS.DynamoDB.DocumentClient();
    let params = {
        TableName:"YOUR_TABLE_NAME",
        ProjectionExpression: "id",
        Limit: 100 // only if you want something else that the default 1MB. 100 means 100 items
    };
    let items = [];

    var scanExecute = cb => {
        docClient.scan(params, (err,result) => {
            if(err) return reject(err);

            items = items.concat(result.Items);
            if(result.LastEvaluatedKey) {
                params.ExclusiveStartKey = result.LastEvaluatedKey;
                return scanExecute();
            } else {
                return err
                    ? reject(err)
                    : resolve(items);
            }
        });
    };
    scanExecute();
});

performScan().then(items => {
    // deal with it
});
Sign up to request clarification or add additional context in comments.

1 Comment

2
+50

First things to know about DynamoDB is that it is a Key-Value Store with support for secondary indexes.

DynamoDB is a bad choice if the application often has to iterate over the entire data set without using indexes(primary or secondary), because the only way to do that is to use the Scan API.

DynamoDB Table Scan's are (a few things I can think off)

  1. Expensive(I mean $$$)
  2. Slow for big data sets
  3. Might use up the provisioned throughput

If you know the primary key of all the items in DynamoDB (some external knowledge like primary is an auto incremented value, is referenced in another DB etc) then you can use BatchGetItem or Query.

So if it is a one off thing then Scan is your only option else you should look into refactoring your application to remove this scenario.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.