Using a Match Key Generator MapReduce Job

  1. Create an instance of AdvanceMatchFactory, using its static method getInstance().
  2. Provide the input and output details for the Match Key Generator job by creating an instance of MatchKeyGeneratorDetail specifying the ProcessType. The instance must use the type MRProcessType.
    1. Specify the match key settings to perform the matching by creating and configuring an instance of MatchKeySettings. For more information, see the relevant code sample.
    2. Create an instance of MatchKeyGeneratorDetail by passing an instance of type JobConfig and the MatchKeySettings instance created as the arguments to its constructor.
      The JobConfig parameter must be an instance of type MRJobConfig.
    3. Set the details of the input file using the inputPath field of the MatchKeyGeneratorDetail instance.
      • For a text input file, create an instance of FilePath with the relevant details of the input file by invoking the appropriate constructor.
      • For an ORC input file, create an instance of OrcFilePath with the path of the ORC input file as the argument.
      • For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
    4. Set the details of the output file using the outputPath field of the MatchKeyGeneratorDetail instance.
      • For a text output file, create an instance of FilePath with the relevant details of the output file by invoking the appropriate constructor.
      • For an ORC output file, create an instance of OrcFilePath with the path of the ORC output file as the argument.
      • For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
    5. Set the name of the job using the jobName field of the MatchKeyGeneratorDetail instance.
  3. To create a MapReduce job, use the previously created instance of AdvanceMatchFactory to invoke its method createJob(). In this, pass the above instance of MatchKeyGeneratorDetail as an argument.
    The createJob() method creates the job and returns a List of instances of ControlledJob.
  4. Run the created job using an instance of JobControl.