2

I have key-values like following example:

KEY VALUE
key1    1
key2    2
key3    3
.       .
.       .
keyN    N

Each of my key needs to map a unique number so I am mapping my keys to auto incremented numbers then inserting it to Redis via redis mass insertion which works very well and then using GET command for internal processing of all the key value mappings.

But I have more than 1 billion key so I was wondering is there even more efficient (mainly lesser memory usage) way for using Redis for this scenario?

5
  • your condition is to just have unique key right? Commented Jan 25, 2018 at 16:37
  • i have unique string keys and i need to map them integers. Then i want use this key value mapping with standard GET command. Commented Jan 25, 2018 at 16:45
  • What do you mean by "efficient"? Commented Jan 25, 2018 at 18:19
  • mainly consuming lesser memory Commented Jan 25, 2018 at 18:34
  • also by efficient means since my values belongs to keys are incremented numbers i thought maybe there is alternative way/usage in Redis so i don't need to set this auto incremented values. Commented Jan 25, 2018 at 18:49

5 Answers 5

1

You can pipeline commands into Redis to avoid the round-trip times like this:

{ for ((i=0;i<10000000;i++)) ; do printf "set key$i $i\r\n"; done ; sleep 1; } | nc localhost 6379

That takes 80 seconds to set 10,000,000 keys.


Or, if you want to avoid creating all those processes for printf, generate the data in a single awk process:

awk 'BEGIN{for(i=0;i<10000000;i++){printf("set key%d %d\r\n",i,i)}}'; sleep 1; } | nc localhost 6379

That now takes 17 seconds to set 10,000,000 keys.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks i am using pipeline and it works well just wonder is there any alternative for my string-number mapping usage.
@Tamer Maybe show your code then to make that clear to others since you don't mention pipe-lining in your question.
sure i have updated and mentioned about redis mass insertion which is pipe-lining in redis.
1

The auto-increment key allows a unique number to be generated when a new record is inserted into a table/redis.

There is other way using UUID.

But I think auto-increment is far better due to reason like it need four time more space, ordering cannot be done based on key,etc

1 Comment

in which table you mean in redis?
1

I'm doing exactly the same thing. here is an simple example. if you have a better one, welcome to discuss :)

1. connect to redis

import redis
pool = redis.ConnectionPool(host=your_host, port=your_port)
r = redis.Redis(connection_pool=pool)

2.define a function to incr, use pipe

def my_incr(pipe):
    next_value = pipe.hlen('myhash')
    pipe.multi()
    pipe.hsetnx(
        name='myhash',
        key=newkey, value=next_value
    )

3.make the function become a transaction

pipe = r.pipeline()
newkey = 'key1'
r.transaction(my_incr, 'myhash')

1 Comment

This is not the place for discussion. you can add comments if you are eligible enough for commenting. If you want to discuss use chat feature of stackoverflow, but that too requires 20 rep. I suggest you answer if you are pretty sure that your answer satisfies op's requirement.
0

In order to be more memory efficient, you can use HASH to store these key-value pairs. Redis has special encoding for small HASH. It can save you lots of memory.

In you case, you can shard your keys into many small HASHs, each HASH has less than hash-max-ziplist-entries entries. See the doc for details.

B.T.W, with the INCR command, you can use Redis to create auto-incremented numbers.

6 Comments

thanks i have used HSET with 1 key which contained all my key-values but i didn't see the memory usage difference. maybe i should use all my keys as separate HSET key?
NO, you should have many small HASHs, each HASH has less than hash-max-ziplist-entries entries. If there're too many elements in a HASH, Redis won't use the special encoding.
With the special encoding, you can use up to 10 times less memory (with 5 time less memory used being the average saving)
oh got you i need to shard my keys but in that case when i make HGET i need to know which key is in which shard i will think about how can i shard my keys. and what happens if i increase hash-max-ziplist-entries values?
my keys has not got any specific pattern to shard but if you have any suggestion about sharding let me know thanks.
|
0

I would like to answer my own question.

If you have sorted key values, the most efficient way to bulk insert and then read them is using a B-Tree based database.

For instance, with MapDB I am able to insert it very quickly and it takes up less memory.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.