8

I'm working on big data, I'm trying to parallelize my process functions. I can use several threads and process every user is a different thread (I have 200k users).

Every thread should append the first n lines of a file that produce, in an output file, shared between all the threads.

I wrote a Java program that execute head -n 256 thread_processed.txt >> output (every thread will do this)

I need the output file to be wrote in an atomic way.

If the thread A wrote lines from 0 to 9 and threads B wrote lines from 10 to 19 the output should be: [0...9 10... 19]. Lines can't overlaps, it can't be something like [0 1 2 17 18 3 4 ...]

How I can manage concurrent write access to the output file in a bash script?

3
  • 1
    Your Java code needs to write the output of each thread to a separate file, so that another thread can concatenate them in the correct order. You don't need all the threads to complete to concatenate the output from the first k threads, but you do need the first k to complete. Commented Feb 6, 2017 at 19:52
  • Do a mega hack and use sed to write to a specific line 😈 But sirioslly if you know how to order do as chepner suggested or prefix the lines with a number and sort them. Commented Feb 6, 2017 at 23:10
  • ps. or make the lines the same size and you'll be able to put them in the correct possitions easily from java Commented Feb 6, 2017 at 23:18

1 Answer 1

8

sem from GNU Parallel should be able to do it:

sem --id mylock "head -n 256 thread_processed.txt >> output"

It will start a mutex named mylock.

If you are concerned that someone might read output while the head is running:

sem --id mylock "cp output o2; head -n 256 thread_processed.txt >> o2; mv o2 output"
Sign up to request clarification or add additional context in comments.

2 Comments

I have a simple bash script that uses parallel (version 20240522) and reports parallel: This should not happen. You have found a bug.. What am I doing wrong OR is it indeed a bug?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.