0

I have a NodeJS function that scan a table in DynamoDB (without primary sort key) and return the number of elements of the column sync that are null. My table:

var params = {
    AttributeDefinitions: [
        {
        AttributeName: "barname",
        AttributeType: "S"
        },
        {
        AttributeName: "timestamp",
        AttributeType: "S"
        }
    ],
    KeySchema: [
        {
        AttributeName: "barname",
        KeyType: "HASH"
        },
        {
        AttributeName: "timestamp",
        KeyType: "RANGE"
        }
    ],
    ProvisionedThroughput: {
        ReadCapacityUnits: 1,
        WriteCapacityUnits: 1
    },
    TableName: tableName
}; 

The function that count when sync==false

var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
async function getCountNoSync(type){
    console.log(type)
    var params = {
        TableName: tableName,
        FilterExpression: 'sync = :sync and billing = :billing',
        ExpressionAttributeValues: {
            ':billing' : {S: type},
            ':sync' : {BOOL: false}
          },
    };
    
    var count = 0;
    await dynamodb.scan(params).promise()
        .then(function(data){
            count = data.Count;
        })
        .catch(function(err) {
            count = 0;
            console.log(err);
        });

    return count;
}

The function works fine If a have few elements in my table (eg. less than 150). If the number of elements are higher, the count variable is always 0. It loooks like the scan do not find all elements.

Any ideia? Best regards

1 Answer 1

2

The reason that you do not find all the items where attribute sync == null is that the scan operation is only reading part of your table.

As the documentation states:

If the total number of scanned items exceeds the maximum dataset size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.

So if your table is several hundred of megabytes big, you need to call scan() multiple times and provide the LastEvaluatedKey to read the next "page" of your table. This process is also called "pagination".

But this will take a lot of time and the time this needs will just increase with your table size. The proper way of doing this would be to create an index of the sync field and then do a query() on that index.

You can read more about that in the AWS documentation:

  1. Querying and Scanning a DynamoDB Table
  2. Reference documentation for scan()
  3. Paginating the Results
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.