Using a Best of Breed MapReduce Job
-
Create an instance of
AdvanceMatchFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Best of Breed job by creating an
instance of
BestofBreedDetail
specifying theProcessType
. The instance must use the type MRProcessType.-
Specify the column using which the records are to be grouped by
creating an instance of
GroupbyOption
.Use an instance of GroupbyMROption to specify the group-by column and the number of reducers required. -
Generate the consolidation and template rules for the job by creating
an instance of
BestOfBreedConfiguration
. Within this instance:- Define the template record for the consolidation using an
instance of
ConsolidationCondition
, which comprises ofConsolidationRule
instances. - Define the consolidation conditions
using instances of
ConsolidationCondition
, and connecting the conditions using logical operators.Each instance of
ConsolidationCondition
is defined using aConsolidationRule
instance and its correspondingConsolidationAction
instance.
Note: Each instance ofConsolidationRule
can be defined either using a single instance ofSimpleRule
, or using a hierarchy of childSimpleRule
instances and nestedConjoinedRule
instances joined using logical operators. See Enum JoinType and Enum Operation. - Define the template record for the consolidation using an
instance of
-
Create an instance of
BestofBreedDetail
, by passing an instance of typeJobConfig
, theGroupbyOption
instance created, and theBestOfBreedConfiguration
instance created above as the arguments to its constructor.TheJobConfig
parameter must be an instance of type MRJobConfig. -
Set the details of the input file using the
inputPath
field of theBestofBreedDetail
instance.For a text input file, create an instance ofFilePath
with the relevant details of the input file by invoking the appropriate constructor. For an ORC input file, create an instance ofOrcFilePath
with the path of the ORC input file as the argument. -
Set the details of the output file using the
outputPath
field of theBestofBreedDetail
instance.For a text output file, create an instance ofFilePath
with the relevant details of the output file by invoking the appropriate constructor. For an ORC output file, create an instance ofOrcFilePath
with the path of the ORC output file as the argument. -
Set the name of the job using the
jobName
field of theBestofBreedDetail
instance. -
Set the
compressOutput
flag of theBestofBreedDetail
instance to true to compress the output of the job.
-
Specify the column using which the records are to be grouped by
creating an instance of
-
To create a MapReduce job, use the previously created instance of
AdvanceMatchFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofBestofBreedDetail
as an argument.ThecreateJob()
method creates the job and returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
. -
To display the reporting counters post a successful MapReduce job run, use the
previously created instance of
AdvanceMatchFactory
to invoke its methodgetCounters()
, passing the created job as an argument.