If I understood correctly, you have a table with the following definition of Primary Keys:
Hash Key : date.random_number
Range Key : timestamp
One thing that you have to keep in mind is that , whether you are using GetItem or Query, you have to be able to calculate the Hash Key in your application in order to successfully retrieve one or more items from your table.
It makes sense to use the random numbers as part of your Hash Key so your records can be evenly distributed across the DynamoDB partitions, however, you have to do it in a way that your application can still calculate those numbers when you need to retrieve the records.
With that in mind, let's create the query needed for the specified requirements. The native AWS DynamoDB operations that you have available to obtain several items from your table are:
Query, BatchGetItem and Scan
In order to use BatchGetItem you would need to know beforehand the entire primary key (Hash Key and Range Key), which is not the case.
The Scan operation will literally go through every record of your table, something that in my opinion is unnecessary for your requirements.
Lastly, the Query operation allows you to retrieve one or more items from a table applying the EQ (equality) operator to the Hash Key and a number of other operators that you can use when you don't have the entire Range Key or would like to match more than one.
The operator options for the Range Key condition are: EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN
It seems to me that the most suitable for your requirements is the BETWEEN operator, that being said, let's see how you could build the query with the chosen SDK:
Table table = dynamoDB.getTable(tableName);
String hashKey = "<YOUR_COMPUTED_HASH_KEY>";
String timestampX = "<YOUR_TIMESTAMP_X_VALUE>";
String timestampY = "<YOUR_TIMESTAMP_Y_VALUE>";
RangeKeyCondition rangeKeyCondition = new RangeKeyCondition("RangeKeyAttributeName").between(timestampX, timestampY);
ItemCollection<QueryOutcome> items = table.query("HashKeyAttributeName", hashKey,
rangeKeyCondition,
null, //FilterExpression - not used in this example
null, //ProjectionExpression - not used in this example
null, //ExpressionAttributeNames - not used in this example
null); //ExpressionAttributeValues - not used in this example
You might want to look at the following post to get more information about DynamoDB Primary Keys:
DynamoDB: When to use what PK type?
QUESTION: My concern is querying multiple times because of random_number attached to it. Is there a way to combine these queries and hit dynamoDB once ?
Your concern is completely understandable, however, the only way to fetch all the records via BatchGetItem is by knowing the entire primary key (HASH + RANGE) of all records you intend to get. Although minimizing the HTTP roundtrips to the server might seem to be the best solution at first sight, the documentation actually suggests to do exactly what you are doing to avoid hot partitions and uneven use of your provisioned throughput:
Design For Uniform Data Access Across Items In Your Tables
"Because you are randomizing the hash key, the writes to the table on
each day are spread evenly across all of the hash key values; this
will yield better parallelism and higher overall throughput. [...] To
read all of the items for a given day, you would still need to Query
each of the 2014-07-09.N keys (where N is 1 to 200), and your
application would need to merge all of the results. However, you will
avoid having a single "hot" hash key taking all of the workload."
Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
Here there is another interesting point suggesting the moderate use of reads in a single partition... if you remove the random number from the hash key to be able to get all records in one shot, you are likely to fall on this issue, regardless if you are using Scan, Query or BatchGetItem:
Guidelines for Query and Scan - Avoid Sudden Bursts of Read Activity
"Note that it is not just the burst of capacity units the Scan uses
that is a problem. It is also because the scan is likely to consume
all of its capacity units from the same partition because the scan
requests read items that are next to each other on the partition. This
means that the request is hitting the same partition, causing all of
its capacity units to be consumed, and throttling other requests to
that partition. If the request to read data had been spread across
multiple partitions, then the operation would not have throttled a
specific partition."
And lastly, because you are working with time series data, it might be helpful to look into some best practices suggested by the documentation as well:
Understand Access Patterns for Time Series Data
For each table that you create, you specify the throughput
requirements. DynamoDB allocates and reserves resources to handle your
throughput requirements with sustained low latency. When you design
your application and tables, you should consider your application's
access pattern to make the most efficient use of your table's
resources.
Suppose you design a table to track customer behavior on your site,
such as URLs that they click. You might design the table with hash and
range type primary key with Customer ID as the hash attribute and
date/time as the range attribute. In this application, customer data
grows indefinitely over time; however, the applications might show
uneven access pattern across all the items in the table where the
latest customer data is more relevant and your application might
access the latest items more frequently and as time passes these items
are less accessed, eventually the older items are rarely accessed. If
this is a known access pattern, you could take it into consideration
when designing your table schema. Instead of storing all items in a
single table, you could use multiple tables to store these items. For
example, you could create tables to store monthly or weekly data. For
the table storing data from the latest month or week, where data
access rate is high, request higher throughput and for tables storing
older data, you could dial down the throughput and save on resources.
You can save on resources by storing "hot" items in one table with
higher throughput settings, and "cold" items in another table with
lower throughput settings. You can remove old items by simply deleting
the tables. You can optionally backup these tables to other storage
options such as Amazon Simple Storage Service (Amazon S3). Deleting an
entire table is significantly more efficient than removing items
one-by-one, which essentially doubles the write throughput as you do
as many delete operations as put operations.
Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html