0

I had a file named sample.txt 2GB (example). I want to split the file into four parts and each should be read simultaneously and write on the other file Sample1.txt simultaneously.

Please help me.

2
  • 2
    you might want to have a look at Apache Hadoop. the framework implements mapreduce, which seems to be exactly what you need. Commented May 14, 2011 at 18:41
  • 1
    Maybe this question is related to Multithreaded access to file Commented May 14, 2011 at 18:47

3 Answers 3

2

you might want to have a look at Apache Hadoop. the framework implements mapreduce, which seems to be exactly what you need

Sign up to request clarification or add additional context in comments.

Comments

2

I assume you know it's not possible to insert extra data in the middle of a file. So you'd need to know in advance how large Sample1.txt will be (in bytes) and what position each of the 4 blocks will start at. You would then create the file of the correct size.

You could then use a RandomAccessFile for each of the writers, each initialized with a seek() to the position (in bytes) where that block will start. The same with the reading - you seek the position on which you start.

Note that this is not line oriented, but byte oriented. You almost have to assume fixed size lines in the input, and certainly in the output.

Also note that having multiple processes reading and writing to the same file only increases speed if processing overhead is significant. Otherwise you'll just lose speed due to the fact that the harddisk head has to move to a new position all the time.

I would probably use a single reader thread, a single writer thread, and multiple processing threads using the producer-consumer pattern.

The reader would read each line and write it to a BlockingQueue. The processors take() from that queue, and write to a single other BlockingQueue. The writer thread would take() from that second queue and write to disk. (The order of input and output could/would be lost though).

The BlockingQueue javadoc also describes the producer-consumer pattern.

That way your slow IO is single threaded (or actually dual threaded) and the fast CPU is doing lots of processing in multiple threads.

If you don't need a lot of processing per line, forget about multiple threads. Your speed then is limited by IO and that will only get slower the more threads you use.

Comments

0

You need to create four threads. Each thread opens the file at its own position, you can calculate the position for each thread and pass it in the contructor, you may also want to pass the size of data. Then each thread in loop reads data from file in a buffer and writes the data into another file.

1 Comment

@amit It depends on the task at hand. There's not enough context in the question… It may be a homework task where you would have to re-invent the wheel. The answer by extraneon looks better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.