Query for empty or null values in DynamoDB with large data

Question

I have to get the items from a table similar to what I have provided below where the Result and Accuracy don't exist or Null. The table is having a lot of data and it is growing daily. I tried to get the items from the dynamo DB using the scan. But it is taking a lot of time to show the result. Can anyone suggest me how to run the query or scan which will provide me the result faster?

Dynamo DB Table(Sample):

|ID(primary)|date(String)|Result(number)|Accuracy(Number)|
|   edfca   |   20180101 |    12        |     59.99      |
|    yrdfa  |  20180109  |    72        |     91.49      |
| arfead    | 20180101   |    122       |     32.12      |
|  readssda | 20180112   |              |                |
| geeddar   | 20180123   |    3         |     81.11      | 
|  rogeasa  |  20180203  |              |                |
| swiwads   | 20180205   |     32       |      13.12     |

Currently, I'm using the following

java:

ScanSpec scanSpec = new ScanSpec();
scanSpec.withFilterExpression("attribute_not_exists(accuracy)");
scanSpec.setMaxResultSize(100);
table.scan(scanSpec);

The above scan operation is very slow.

I'm new to dynamo DB and I googled it and found it's better to use a query than using the scan operation.

Can I use the query operation for getting the data I need? If yes please provide an example

Possible duplicate of How do you query for a non-existent (null) attribute in DynamoDB — Srini V
– Srini V, Commented Feb 9, 2018 at 11:44
Not a duplicate question. I want to solve it using the query operation instead of the Scan operation as scan operation is very slow. I'm looking for other possible ways to get the items faster from the table — Siva
– Siva, Commented Feb 9, 2018 at 11:48

Costin · Accepted Answer · 2018-02-09 15:56:45Z

When you do a query you have to specify the Partition Key. Given your table, assuming you have not defined any secondary index, you can only query for one partition key. Useless.

Should you really want to improve the performance I can imagine a solution. When you write items in your table, define a column hasNull: string (for example) and set it to 'y' for all rows with fills the condition. Let it undefined for all other items.

Later, define a Global Secondary Index named MY_GSI_NAME:

Partition Key: hasNull
Sorting Key: ID
Projection type (?): KEYS_ONLY (or ALL, see dynamodb docs)

In node.js you may query your table with:

const params = {
  TableName: 'MY_TABLE_NAME',
  IndexName: 'MY_GSI_NAME',
  KeyConditionExpression: '#pk = :pk',
  ExpressionAttributeNames: {
    '#pk': 'hasNull',
  },
  ExpressionAttributeValues: {
    ':pk': 'y',
  },
};

return MY_DOCUMENT_CLIENT.query(params).promise();

Doing that you'll get all the items with hasNull === 'y'.

That query has the advantage that only the items you are interested in are scanned since the index is sparse (hasNull is missing for the items you do not need).

Hope this give you a starting point to solve your problem. Ask more to know more :)

Bonne chance!

Collectives™ on Stack Overflow

Query for empty or null values in DynamoDB with large data

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related