Newest 'distributed-system' Questions

1 vote

1 answer

42 views

How do I redesign a broken multi-service system where the entry point and child services are out of sync?

I recently joined a startup that has a pretty messy backend setup, and I’ve been assigned to sort it out. Here’s the situation: There’s one main entry point (a federation/onboarding service) that’s ...

Adithya Srikar

1

asked Nov 8 at 18:34

0 votes

1 answer

49 views

Should I store Stripe Subscription details in local DB or query Stripes API to determine if a user is subscribed?

I have the following requirement. A website (lets call it Website A) where I sell subscription plans for my SaaS Payments are handled with Stripe I am using an authentication service (Auth0) so users ...

kmylonas

21

asked Oct 23 at 14:06

0 votes

0 answers

63 views

ADKG-based threshold ECDSA signature recovers different address per transaction—how to compute aggregate `r` and signature parameters?

Background I’m implementing Asynchronous Distributed Key Generation (ADKG) over secp256k1 so that N nodes collectively hold a threshold private key. After DKG each node has a secret share. To sign an ...

Shubham Gupta

121

asked Jul 28 at 9:12

0 votes

0 answers

34 views

gRPC: HTTP-level resilience handler (retries)?

Should gRPC clients implement an HTTP-level resilience handler? Or only rely on the gRPC-level RetryPolicy? Why/why not? For example, if the server responds with a 5xx status code (unexpected but ...

Lindeberg

121

asked Jul 11 at 7:58

0 votes

0 answers

43 views

Is there a problem having min.insync.replicas < half the RF?

Many examples of Kafka topic configuration have RF = 3, min.insync.replicas = 2. In the case of a cluster of 5 brokers, if we use RF = 5, should min.insync.replicas = 3? That seems "natural" ...

alabaster

130

asked Jun 19 at 3:59

0 votes

1 answer

75 views

How do I limit the count of rows in a result set in Postgres without unnecessary locking?

This question is inspired by a 'general admission' variant of the common 'event ticketing' System Design interview question. Critically in this version, the user does not select a seat - they only say ...

asantas93

33

asked Apr 28 at 4:32

0 votes

0 answers

26 views

How can I ensure this UDF works reliably across nodes if OceanBase scales horizontally?

OceanBase Version: V4.2 I’m using OceanBase in MySQL mode, and I noticed that functions like NOW() and CURRENT_TIMESTAMP only provide microsecond (6-digit) precision. So I’m trying to create a custom ...

user30195804

asked Apr 27 at 13:05

0 votes

0 answers

37 views

Proving the Log Matching property in Raft consensus protocol

I have been trying to understand the Raft protocol for quite some time now. One thing that has always stumped me is the proof of the Log Matching property. One of my concerns is that the proof in the ...

arl

118

asked Apr 14 at 19:07

0 votes

0 answers

39 views

How to use socketio and ipc together in python

I am trying to building an server that communicates with the client using socketio protocol, Now the server starts multiple application as a sub processes, the server communicates with this ...

Souvik De

70

asked Apr 12 at 5:00

0 votes

1 answer

63 views

is last write wins redundant for immutable keys?

In the book Designing Data-Intensive Applications > chapter-5 > Leaderless replication > Detecting Concurrent Writes, below is what Author says while talking about Last write wins (LWW) The ...

wenn32

1,394

asked Mar 16 at 3:14

0 votes

1 answer

70 views

Apache JMeter Master Slave Setup on Azure VM

I am new to JMeter Distributed Environment setup. I dont have any knowledge on how to setup the master slave configuration. I just have the information that we have a Master VM and we can spin up ...

Farhan Meer

1

asked Mar 7 at 14:53

0 votes

0 answers

32 views

Handling Database Failures in a Distributed System with RabbitMQ Workers

I have a worker that processes tasks from RabbitMQ and inserts data into a database. The system operates at high scale, handling thousands of messages per second, which makes proper failure handling ...

Yakir

19

asked Mar 7 at 3:28

0 votes

0 answers

44 views

Why does Lamport's Distributed Mutual Exclusion Algorithm require the reply's timestamp to be greater than the request's timestamp?

In Lamport's Distributed Mutual Exclusion algorithm, a process can enter the critical section if two conditions are met: Its request is at the head of its own queue. It has received a reply from all ...

Benjamin

51

asked Feb 24 at 8:10

1 vote

0 answers

21 views

In which cases Multi-Paxos couldn't support Primary Order required in Primary-Backup replication system?

In paper Zab: High-performance broadcast for primary-backup systems, the figure 1 shows that Paxos could violate primary order of requests. I understand the result will be like that if each proposer ...

user1532146

313

asked Feb 3 at 22:06

2 votes

1 answer

56 views

How can I sync JWT blacklists in PHP microservices without a bottleneck or SPOF?

I’m working on a microservices-based project where each service is a separate PHP application. They all rely on JWT for authentication and authorization. The tricky part is revoking (or blacklisting) ...

Kamyar Safari

665

asked Jan 23 at 22:37

0 votes

0 answers

37 views

Why is Ceph and its CRUSH algorithm less used for big data analytics?

As Ceph and its CRUSH algorithm ruled out the issue of metadata server's contention, and surely decreased object fetching latency by removing the RPC to query object location, why it is less adopted ...

Coulson Liang

39

asked Jan 1 at 2:38

0 votes

1 answer

37 views

is Lost update possible with RAFT?

Lets say We have a cluster of 5 nodes and A is the leader. Following sequence of events take place: A sends the replicate change request in parallel to all the followers. Only B could receive the ...

Tarun

3,175

asked Dec 26, 2024 at 4:22

-1 votes

1 answer

181 views

Does Raft follower store leader ID? If not, how does it redirect requests to leader?

Referring to this table depicted in the Raft paper, I did not find where do followers memorize the leader in any form such as identifier, physical addr, etc. Instead, I only find the leader ID in ...

PkDrew

2,281

asked Dec 25, 2024 at 1:51

1 vote

1 answer

154 views

Kafka and hotspots in a partition

I am new to Kafka and I understand that there is only guarantee of message order within one partition and not across partitions. What I am not sure is if this can create scalability issues e.g. in ...

smith

311

asked Dec 23, 2024 at 22:07

-3 votes

1 answer

35 views

Apache Kafka Problems with offset

I'm using confluent kafka library to create a distributed system, but I'm failing to understand some principles of Kafka itself. Lets say right now I'm working with a Central, that has to listen to ...

keykey13

1

asked Dec 10, 2024 at 19:00

0 votes

1 answer

22 views

NServicebus command handlers and multi region

We have an NServiceBus application running in two Azure regions: North Europe and West Europe. We are using SQL transport, and both applications in these regions connect to a shared database. ...

Rick Neeft

145

asked Dec 3, 2024 at 14:20

0 votes

1 answer

36 views

How to maintain synchronization between distributed python processes?

I have a number of workstations that run long processes containing sequences like this: x = wait_while_current_is_set y = read_voltage z = z + y The workstations must maintain synchronization with a ...

david

2,697

asked Dec 1, 2024 at 6:40

0 votes

0 answers

83 views

Is MassTransit with Azure Service Bus a good fit for tenant-isolated on-prem agents?

We are using MassTransit with Azure Service Bus in our backend system for messaging. Now, we need to extend our solution to communicate with on-premises agents that will be installed for each of our ...

SOK

595

asked Nov 22, 2024 at 6:55

0 votes

0 answers

20 views

Using Helix for managing load elastically, something like Kafka Consumer Group

I am trying to build an application. I have to run a job infinitely, i.e. in while(true). To increase throughput of job, it is split across partitions. We can compare this to be like Kafka Consumer, ...

Kumar Shrey

1

asked Nov 22, 2024 at 6:06

2 votes

1 answer

125 views

What data is stored in the log compaction snapshot of a Raft-based distributed file system?

I'm working on a Raft implementation as part of my distributed file system and I've run into a problem with the log compaction process. Accurding to the official Raft paper, when a log reaches a ...

Dror Chen

21

asked Nov 21, 2024 at 15:46

0 votes

1 answer

177 views

Microservices inbox-outbox pattern

I have a question I'm curious about. Let's say we are developing a microservice social media application (I chose this topic for practical purposes :)). I'm using the inbox-outbox pattern to ensure ...

OnurcanOgul

11

asked Oct 27, 2024 at 13:50

1 vote

4 answers

179 views

How does client handle failures in RAFT-replicated datastores? [closed]

Consider a database like cockroachDB that uses RAFT protocol for replicating data to a replica group owning a partition of the data. How does a client handle a request that fails in such DBs? Because, ...

Dumb_Pegasus

129

asked Oct 23, 2024 at 5:28

1 vote

1 answer

157 views

Handling Correlation ID Changes in Event Sourcing When an Entity Switches Context

I'm working on an event-sourced application that crawls sports betting games from different bookmakers. I have two primary aggregates in my system: Game: Represents a sports betting event for a ...

Ari Seyhun

12.8k

asked Oct 8, 2024 at 14:32

0 votes

1 answer

429 views

Multiple DbContext in .NET Aspire

I am trying to add multiple DbContext instances in app, which is launched with .NET Aspire. I also want those separate contexts to have configuration available (in this case to have migration history ...

Vytenis Kajackas

41

asked Sep 30, 2024 at 12:45

0 votes

1 answer

228 views

Log viewer for binary logs

I have a clustered real time system that produces a very large amount of binary logs. I get a bunch of binary logs from each node in the system and I want to view the logs in a convinent way. Mostly, ...

shaharhoch

115

asked Sep 25, 2024 at 18:24

0 votes

1 answer

70 views

Fault-tolerant queue-worker architecture in Kafka?

I am new to using queue-worker architectures and I'm interested in how to make it resilient to a worker failing. For example We have a pool of workers Alpha that put entries onto queue A Then the ...

Lubed Up Slug

178

asked Sep 24, 2024 at 21:08

1 vote

1 answer

225 views

Redis backed rate limiter -- Inconsistent?

I was following this blog on implementing Rate Limiter using Redis. Link to the blog Here they have used MULTI to pack all the atomic commands. This ensures that we're not concurrently writing wrongly ...

Gitesh Khanna

142

asked Sep 20, 2024 at 17:27

2 votes

0 answers

68 views

Tomcat Web Application Not Loading Correctly in Docker Container: HTTP Status 404

I am running a TomEE server inside a Docker container, but my web application is not loading as expected. Here is the setup I'm using: Docker Image and Container: Image: interesting_picture:latest (...

Sushi

23

asked Sep 13, 2024 at 14:27

1 vote

1 answer

677 views

Is my understanding of a Distributed Lock correct?

I'm having some trouble understanding the need of a distributed lock. I did think of an example where it may be required but I'm not completely sure. I would appreciate some comments if I'm thinking ...

Laksh Chauhan

13

asked Sep 2, 2024 at 11:09

1 vote

0 answers

52 views

How to Aggregate Asynchronous Messagges from a Queue into a Final Result at Scale?

I'm working on a system where: A producer sends approximately 100 million messages daily to a message queue. The consumer processes each message from the queue and produces multiple parts as output. ...

Pouya Rezaei

222

asked Aug 13, 2024 at 17:45

1 vote

1 answer

98 views

Opentelemetry traceid for Couchbase Database Change Protocol

I wanted to address an important aspect of our microservices architecture, specifically regarding our tracing implementation with OpenTelemetry. We have multiple microservices operating seamlessly ...

Raushan

347

asked Aug 9, 2024 at 17:53

0 votes

0 answers

45 views

An interesting scheduling problem: how to serve multi-stage microservices chain that share resources

Recently, I encountered a scheduling problem in a distributed system and I hope to get some help: for a multi-stage microservice that has two stages calling the same instance, such as A-->B-->A, ...

user26585062

1

asked Jul 31, 2024 at 11:41

0 votes

1 answer

201 views

Replicated & Distributed Clickhouse - Keeper Replication

I need to create the replication in clickhouse; on two different machines that are under the same network. I have tried to configure it but I have the following error: SQL Error [999] [07000]: Code: ...

Vilma Zorina Camacho Cagal

1

asked Jul 23, 2024 at 13:36

3 votes

2 answers

358 views

Why is read repair not sufficient in making dynamo-style database linearizable?

I am reading DDIA. It says "possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178)...

Zack Light

362

asked Jul 10, 2024 at 4:19

0 votes

1 answer

60 views

Table primary key uniqueness across different / multi-region Amazon RDS postgres

I have an application that uses postgres database on one region (US West) containing several tables, one of which contains several hundred thousand records (let's call it "events" table with ...

ct101

1

asked Jul 9, 2024 at 18:54

0 votes

0 answers

28 views

How can a distributed system satisfy CP in CAP theorem?

If a system is partition tolerant, it's impossible for it to be consistent since there's no way for one node to update another. How can you be both consistent while partition intolerant be possible?

JobHunter69

2,366

asked Jun 27, 2024 at 22:24

0 votes

0 answers

51 views

Deduplication , Grouping for events table at scale

I'm working with an events table where different source tables trigger writes into this table with columns: entity_id and payload. These events are then published to a Kafka topic using a message ...

Forece85

518

asked Jun 19, 2024 at 5:04

1 vote

1 answer

664 views

Message ordering in event driven architecture

Consider there are 3 microservices - s1, s2 and s3. s1 sends message m1. s2 consumes message m1, applies some business logic and then sends message m2. The problem is that s3 receives message m2 ...

Yash

27

asked Jun 18, 2024 at 10:36

0 votes

1 answer

146 views

Citus Colocation Behavior

I am using Citus as a managed service in the cloud with Azure Cosmos DB for PostgreSQL. I have 1 coordinator and 2 worker nodes setup. There are distributed tables and reference tables created. ...

Rohith K

1

asked Jun 14, 2024 at 7:37

2 votes

0 answers

78 views

(Re)attaching to an App Insights Operation from another machine/process (not using HTTP)

I have a .NET 8 distributed system in AKS where work is divided among workers using a Manager/Worker pattern. With work shared out on a Redis List. I'm aiming to get unified logging via Application ...

Andrew Matthews

3,174

asked Jun 11, 2024 at 7:43

1 vote

0 answers

465 views

How to efficiently pagination and sort data from multiple services?

I need to fetch and filter data from three different services: ProductService, PriceService, and StockService. My goal is to get products that belong to Category = 54, have stock available, and are ...

mehmtee10

11

asked Jun 1, 2024 at 16:08

2 votes

2 answers

153 views

Handling Race Condition in distributed system

Hi I have order creation functionality in my project and I am giving a order_id to client which is a auto-increment ID order = serializer.save(user=user,created_by=user,platform=platform) Now how ...

Ramprasad Thakur

21

asked May 26, 2024 at 23:37

1 vote

3 answers

854 views

Can Sloppy Quorum guarantee strong read consistency?

In the book "Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems", we can read regarding Sloppy Quorum : However, this means that even ...

Yas

63

asked May 22, 2024 at 15:20

1 vote

1 answer

56 views

EDA Choreography - keep overall state

In an event driven architecture using choreogeaphy model, how do we keep current, global state of the process? Lets say we have a process where many services p1,...,pn transition many states s1,...,...

tlt

15.5k

asked May 21, 2024 at 10:37

2 votes

0 answers

27 views

MPI_Iprobe does not receive new events after START_LELECT_ST tag is processed

I am building a Satellite (ST) to Ground Stations (GS) distributed communication system, and I am at the point where I am implementing leader election among satellites (in a ring) and among ground ...

Stelios Papamichail

1,333

asked May 12, 2024 at 9:21

Collectives™ on Stack Overflow