1

I'm trying to RELIABLY implement that pattern. For practical purposes, assume we have something similar to a twitter clone (in cassandra and nodejs).

So, user A has 500k followers. When user A posts a tweet, we need to write to 500k feeds/timelines.

Conceptually this is easy, fetch followers for user A, for each one: write tweet to his/her timeline. But this is not "atomic" (by atomic I mean that, at some point, all of the writes will succeed or none will).

async function updateFeeds(userId, tweet) {

  let followers = await fetchFollowersFor(userId)
  for(let f of followers) {
    await insertIntoFeed(f, tweet)
  }

}


This seems like a DoS attack:


async function updateFeeds(userId, tweet) {

  let followers = await fetchFollowersFor(userId)
  await Promise.all(followers.map(f => insertIntoFeed(f, tweet)))

}


How do I keep track of the process? How do I resume in case of failure? I'm not asking for a tutorial or anything like that, just point me in the right direction (keywords to search for) if you can please.

3
  • There is no atomic way to write to 500k feeds so it's unclear why you're mentioning that. As for errors, just add any feed that you get an error from to a list of feeds to retry when you finish with the others. Then, retry them after a short delay a max of N times (where N is some small number) and each retry has a longer delay from the previous one. Commented May 27, 2024 at 16:50
  • Promise.all() will never be what you want if you want to keep track of which feeds succeeded and which failed since it short circuits on first failure and doesn't report anything about all the other requests. And, yes it will look like at least a rate limiting violation, if not a DOS attack. Commented May 27, 2024 at 16:50
  • @jfriend00 of course, i edited the "atomic" part. Commented May 27, 2024 at 17:01

2 Answers 2

3

In addition to @christophe-quintard answer there is another trick to consider. Which is - to ... not use the fan out write pattern here.

Basically instead of writing a big number of tweets into 500k timelines you just create a separate abstraction for "popular"/"hot" accounts (it can be counted based on number of followers for example or number of followers, maybe number of tweets per day can be in consideration too) and build the timeline for their subscribers on the fly. Hence you fetch the "ordinary" timeline and join it with the all "popular" ones for the user when it is requested, this way you can reduce amount of data stored and processed.

For "non-hot" accounts you just do some batching plus eventually consistent processing, i.e. you post a message to some background processor that will do some kind of batching processing (there are several options/things to consider here).

Sign up to request clarification or add additional context in comments.

3 Comments

right. could you expand on that last paragraph? at least 2 or 3 of the options to consider here
@InglouriousBastard expanding it would basically repeat the answer by Christophe - you move the processing to some async pipeline which will handle the reading the list of subs and sending fan out messages. There is some variance on what how you can do it but it can hugely depend on how data is stored and what technologies do you use. I have not worked with Cassandra so it does not say much to me about how the data is stored in your case.
@GuruStron hi, Sql wizard. do you think you could help out on this? stackoverflow.com/questions/78552841/…
2

I would start by setting up a message broker (like Kafka), and write all the tweets into a topic.

Then develop an agent that consumes the messages. For each message, the agent fetches a batch of users that are followers but have not yet the tweet into their feed, and insert the tweet into the feed of each user. When there are no more users that are followers but have not the tweet, the agent commits the message and process the following messages. The reason for proceeding this way is resilience : if for any reason the agent is restarted, it will resume from where it left.

Configure the topic with a lot of partitions, in order to be able to scale up the processing of the messages. If you have ONE partition, you can have ONE agent to process the messages. If you have N partitions, you can have up to N agents to process the messages in parallel.

To keep track of the overall processing, you can watch the "lag" into the message broker, which is the number of messages yet to be processed into the topic. If it is too high for too long, then you have to scale up the number of agents.

If you want to keep track of the processing of a given message, the agent can query how many users are still to be processed before processing a batch of users. Then the agent can log this number, or expose it through its API, or expose it as a Prometheus metric...

2 Comments

i dont know if you are familiar with cassandra db, but what would you consider a batch of followers in this specific case?
A batch is a limited number of elements. You do not want to query ALL the elements at once if there are hundreds or thousands of them (you will have a OutOfMemory exception). So just query a hundred of elements, process them, query a hundred more, process them, and so on until none are returned.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.