Using a Global Address Validation MapReduce Job

  1. Create an instance of AddressValidationFactory, using its static method getInstance().
  2. Provide the input and output details for the Global Address Validation job by creating an instance of AddressValidationDetail specifying the ProcessType. The instance must use the type MRProcessType. For this, the steps are:
    1. Create an instance of productDatabaseInfo, and set these details:
      1. ReferenceDataPath: Use Enum ReferenceDataPathLocation
      2. CountryCode: Use Enum CountryCodes
      3. ProcessType: Use Enum AddressValidationProcessType
    2. Create an array list class ProductDatabaseInfoList and use the add() method to insert the ProductDatabaseInfo.
    3. Create an instance of AddressValidationEngineConfiguration, and in this instance, set the ProductDatabaseInfoList.
    4. Create an instance of AddressValidationInputOption, and set these details to this new instance:
      • Casing
      • MatchMode
      • DefaultCountry
      • MaximumResults
      • ReturnInputAddress
      • ReturnParsedAddress
      • ReturnPrecisionCode
      • ReturnMatchScore
      • MustMatchAddressNumber
      • MustMatchStreet
      • MustMatchCity
      • MustMatchLocality
      • MustMatchState
      • MustMatchStateProvince
      • MustMatchPostCode
      • KeepMultiMatch
      • PreferPostalOverCity
      • CityFallback
      • PostalFallback
      • ValidationLevel
    5. Create an instance of AddressValidationDetail, by passing the job configuration, addressValidationEngineConfiguration, and inputOption instance created earlier as the arguments to its constructor. To this instance, set these details:
      Note: The Config parameter must be an instance of type MRJobConfig (for an MR job) and SparkJobConfig (for a Spark job).
      1. Set the details of the input file using the inputPath field.
        Note:
        • For a text input file, create an instance of FilePath with the relevant details of the input file by invoking the appropriate constructor.
        • For an ORC input file, create an instance of OrcFilePath with the path of the ORC input file as the argument.
        • For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
      2. Set the details of the output file using the outputPath field.
        Note:
        • For a text output file, create an instance of FilePath with the relevant details of the output file by invoking the appropriate constructor.
        • For an ORC output file, create an instance of OrcFilePath with the path of the ORC output file as the argument.
        • For a parquet output file, create an instance of ParquetFilePath with the path of the parquet output file as the argument.

      3. Set the name of the job using the jobName field.
      4. Set the compressOutput flag to false to prevent compressing the output of the job.
  3. To create a MapReduce job, use the previously created instance of AddressValidationFactory to invoke its method createJob(). In this, pass the above instance of AddressValidationDetail as an argument.
    The createJob() method returns a List of instances of ControlledJob.
  4. Run the created job using an instance of JobControl.
  5. To display the reporting counters post a successful MapReduce job run, use the previously created instance of AddressValidationFactory to invoke its method getCounters(), passing the created job as an argument.