0

I have a file which contains number of records of varying length. What would be the efficient algorithm to sort these records.

Record sample:

000000000000dc01 t error_handling 44

0000000dfa01a000 t fun 44

Total record = >5000 Programming language c

I would like to know which algorithm is suitable to sort this file based on address and what would be the efficient way to read these records?

4
  • You say each record is more than 5000 bytes? Or you have more than 5000 records, each of 20-100 bytes length? Commented Feb 22, 2011 at 15:01
  • Sorry, Files will have minimum 5000 records and maximum is undefined. Commented Feb 22, 2011 at 15:46
  • Do you really need to sort this file yourself? Or is having a utility to do this enough? If you are running on Windows, you could use the SORT command. I successfully used this to sort files with a size of hundreds of megabytes. Commented Feb 22, 2011 at 16:08
  • @Patrick I really need to sort this file myself. This is one of the module in my program. Commented Feb 23, 2011 at 4:37

3 Answers 3

8

If the file is too large to fit into memory, then your only reasonable choice is a file-based merge sort, which involves two passes.

In the first pass, read blocks of N records (where N is defined as the number of records that will fit into memory), sort them, and write them to a temporary file. When this pass is done, you either have a number (call it M) of temporary files, each with some varying number of records that are sorted, or you have a single temporary file that contains blocks of sorted records.

The second pass is an M-way merge.

I wrote an article some time back about how to do this with a text file. See Sorting a Large Text File. It's fairly straightforward to extend that so that it will sort other types of records that you define.

For more information, see External sorting.

Sign up to request clarification or add additional context in comments.

Comments

1

Since the records are of varying length, an efficent method would be:

  1. Read and parse file into array of pointer to records
  2. Sort array of pointers
  3. Write the results

Random accessing the file will be slow as the newlines have to counted to find a specific record.

If you've got a really big file, adapt the process to:

for each n records
   read and parse
   sort
   write to temporary file

mergesort temporary files

Comments

0

In-place Quicksort is one of the best generic sorting algorithm. Faster sorting is possible (such as bucketsort) but it depends on some properties of the data you're sorting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.