0

I'm working on a highload service in which I'm recieving batch messages from kafka (about 500 at once). Those messages represent entities in db. To simplify, lets imagine entity A like this:

create table A (
id varchar references B(id),
serial_num varchar,
is_actual boolean
)

Where a combination of id and serial num is unique.

So when the array of messages recieved from kafka - I'm parsing those messages to entities and then I need to check if such an entity already exists in table A by id and serial_num and if it is not, then I need to update all entities by id and set is_actual to false, otherwise do nothing. After updating - I need to insert new entity. So, for updating and inserting I am using Spring Jdbc Template butchUpdate() method. I've asked already about how to check data existence efficiently here, but my DBA said, that we should not use any select queries because there is no guarantee that another thread will not write the same value during checking operation in another thread, so there is a possibility that select query will not return actual data. So, my DBA recommended me to use insert into on conflict do update query, but I don't really understand, how it can help. Let's imagine, that we have those entries in db, where first number is id, second is serial num and third one is is_actual:

[1, 1, false]
[1, 2, true]
[2, 3, false]
[2, 4, true]

And then - messages from kafka, where first number is id, second is serial_num

[1, 1] - this one can't be actual anymore, but still can come from kafka
[1, 3] - this one should be the last actual serial num for id=1, but it will not override any 
         entries and moreover, it will be overridden by next one
[1, 2] - this one is actual already, but it should be overridden by the previous one, but actually 
         it will override the second one, because: 

I was thinking to use insert into on conflict do update to set all entities is_actual to false by received id and then mark entry with received id and serial_num with is_actul = true, but here is two problems:

  1. This will not work in case we are inserting completely new entry ([1, 3]), because there will be no conflict on insert, by default it will be marked with is_actual on insert, but other actual entries will not become unmarked, so we can have multiple actual values
  2. I'm not even sure if it is possible to do multiple sets in update part of insert into on conflict do update.
  3. In case we didn't receive any new unique pairs we should keep our last actual entry ([1, 2]) marked as is_actual = true, but if [1, 1] will be received after [1, 2] then it will become actual when it should not.

So, the main problem is that I can not use any select queries for checking data existence, so how else can I fulfill the plan? Maybe there are some tricks in insert into on conflict do update query which I don't know?

2
  • Your DBA is right and you should write a simple test to see how it works. Commented Oct 29, 2024 at 16:30
  • And here is another problem - I can't test anything normally because I can't install anything on my remote desktop (db, docker and other things). Moreover - there are completely no tests in my company, only one service has testcontainers but only because our team lead configured ci \ cd, no way to run such tests locally. So will appreciate if you give me some hints on how to deal with my problem Commented Oct 29, 2024 at 16:33

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.