Using a Candidate Finder Spark Job

  1. Create an instance of AdvanceMatchFactory, using its static method getInstance().
  2. Provide the input and output details for the Candidate Finder job by creating an instance of CandidateFinderDetail specifying the ProcessType. The instance must use the type SparkProcessType.
    1. Set the values of hbase_zookeeper_quorum and hbase_zookeeper_property_clientPort in the instance SparkJobConfig.
    2. Generate the query for the job by creating an instance of ComplexSearchQuery. Within this instance:
      1. Set properties such as QueryName, IndexFieldName, and IndexFieldType. The search query can be Numeric, Range, Contains All, and Contains None.
      2. Set the search query properties and connect these using logical operators such as AND and OR.
      Note: Each instance of ComplexSearchQuery can be defined either using a single instance, using a hierarchy of child instances, or nested instances joined using logical operators. See Enum JoinType and Enum Operation.
    3. Set the details of the input file using the inputPath field of the CandidateFinderDetail instance.
      • For a text input file, create an instance of FilePath with the relevant details of the input file by invoking the appropriate constructor.
      • For an ORC input file, create an instance of OrcFilePath with the path of the ORC input file as the argument.
      • For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
    4. Set the details of the output file using the outputPath field of the CandidateFinderDetail instance.
      • For a text output file, create an instance of FilePath with the relevant details of the output file by invoking the appropriate constructor.
      • For an ORC output file, create an instance of OrcFilePath with the path of the ORC output file as the argument.
      • For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
    5. Set the name of the job using the jobName field of the CandidateFinderDetail instance.
    6. Set the FetchBatchSize field of the CandidateFinderDetail instance. The default is 10000.
    7. Set the MaximumResults field of the CandidateFinderDetail instance. The default is 10.
    8. Set the StartingRecord field of the CandidateFinderDetail instance. The default is 1.
  3. To create and run the Spark job, use the previously created instance of AdvanceMatchFactory to invoke its method runSparkJob(). In this, pass the above instance of CandidateFinderDetail as an argument.
    The runSparkJob() method runs the job and returns a Map of the reporting counters of the job.
  4. Display the counters to view the reporting statistics for the job.