4

I am working with a programme that has 4 MapReduce steps.the output of my first step is:

id      value
 1        20
 2         3
 3         9
 4        36

I have about 1,000,000 IDs and in the second step i must sort the values.the output of this step:

 id      value
 4        36
 1        20
 3         9
 2         3

How can I sort my data in map reduce? Do I need to use terasort? If yes, how do I use terasort in second step of my programme? Thanks.

4
  • what do you mean by 4 mapreduce steps? You are running Map Step and Reduce step 4 times ? If you are writing a MapReduce program then you have control over Map Step and Reduce step. Commented May 6, 2013 at 15:59
  • @prashantsunkari no,i have 4 steps and in each of them there is a map and a reduce function. each step does different work.the second step must sort the output of first step. Commented May 6, 2013 at 17:22
  • one of the biggest advantage of mapreduce is it sorts your data according to your key. You want to sort according to what? Commented May 6, 2013 at 18:44
  • @smttsp according to the values. Commented May 6, 2013 at 19:29

1 Answer 1

1

If you want to sort according to value's, make it key in map function. i.e.

id      value
1        20
2         3
3         9
4        36
5         3

(value) (key) in map function

output will be 

key      value
3         5
3         2
9         3
20        1
36        4

map<value, id> output key/value  
reduce <value, id>

if you want id to be in the first column, this will work.

context.write(value, key);

Note that, id's are not going to be sorted

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.