How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

0 votes
asked Aug 10, 2016 in CCD 410 Cloudera Certified Developer for Apache Hadoop (CCDH) by John Hayes (470 points)
retagged Aug 14, 2016 by admin
How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

A. Keys are presented to reducer in sorted order; values for a given key are not sorted.
B. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.
C. Keys are presented to a reducer in random order; values for a given key are not sorted.
D. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

1 Answer

0 votes
answered Aug 10, 2016 by Sandra Reeds (1,040 points)

Answer: A

Explanation:

Reducer has 3 primary phases:

1. Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.

2. Sort
The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

SecondarySort

To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.

3. Reduce

In this phase the reduce(Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs.

The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object). The output of the Reducer is not re-sorted.

Reference:
org.apache.hadoop.mapreduce, Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Most active Members
this month:
    Gute Mathe-Fragen - Bestes Mathe-Forum
    ...