Using a Best of Breed MapReduce Job

  1. Create an instance of AdvanceMatchFactory, using its static method getInstance().
  2. Provide the input and output details for the Best of Breed job by creating an instance of BestofBreedDetail specifying the ProcessType. The instance must use the type MRProcessType.
    1. Specify the column using which the records are to be grouped by creating an instance of GroupbyOption.
      Use an instance of GroupbyMROption to specify the group-by column and the number of reducers required.
    2. Generate the consolidation and template rules for the job by creating an instance of BestOfBreedConfiguration. Within this instance:
      1. Define the template record for the consolidation using an instance of ConsolidationCondition, which comprises of ConsolidationRule instances.
      2. Define the consolidation conditions using instances of ConsolidationCondition, and connecting the conditions using logical operators.

        Each instance of ConsolidationCondition is defined using a ConsolidationRule instance and its corresponding ConsolidationAction instance.

      Note: Each instance of ConsolidationRule can be defined either using a single instance of SimpleRule, or using a hierarchy of child SimpleRule instances and nested ConjoinedRule instances joined using logical operators. See Enum JoinType and Enum Operation.
    3. Create an instance of BestofBreedDetail, by passing an instance of type JobConfig, the GroupbyOption instance created, and the BestOfBreedConfiguration instance created above as the arguments to its constructor.
      The JobConfig parameter must be an instance of type MRJobConfig.
    4. Set the details of the input file using the inputPath field of the BestofBreedDetail instance.
      • For a text input file, create an instance of FilePath with the relevant details of the input file by invoking the appropriate constructor.
      • For an ORC input file, create an instance of OrcFilePath with the path of the ORC input file as the argument.
      • For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
    5. Set the details of the output file using the outputPath field of the BestofBreedDetail instance.
      • For a text output file, create an instance of FilePath with the relevant details of the output file by invoking the appropriate constructor.
      • For an ORC output file, create an instance of OrcFilePath with the path of the ORC output file as the argument.
      • For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
    6. Set the name of the job using the jobName field of the BestofBreedDetail instance.
    7. Set the compressOutput flag of the BestofBreedDetail instance to true to compress the output of the job.
  3. To create a MapReduce job, use the previously created instance of AdvanceMatchFactory to invoke its method createJob(). In this, pass the above instance of BestofBreedDetail as an argument.
    The createJob() method creates the job and returns a List of instances of ControlledJob.
  4. Run the created job using an instance of JobControl.
  5. To display the reporting counters after successful MapReduce job run, use the previously created instance of AdvanceMatchFactory to invoke its method getCounters(), passing the created job as an argument.