Using Configuration Property Files

Ensure the Big Data Quality SDK is installed on your machine.
You can run a Big Data Quality SDK job using the module-specific JAR files and the configuration files in XML formats.

The sample configuration properties are shipped with the Big Data Quality SDK and are placed at the location <Big Data Quality bundle>\samples\configuration.

Note: For a list of the module-specific JAR files, see Components of the SDK Java API.
  1. For a Linux system, open a command prompt.
    For Windows and Unix systems, open an SSH client like Putty.
  2. For a MapReduce job, use the command hadoop.
    Based on the job you wish to run:
    1. Pass the name of the JAR file of that module.
    2. Pass the driver class's name RunMRSampleJob.
    3. Pass the various configuration files as a list of arguments. Each argument key accepts the path of a single configuration property file, where each file contains multiple configuration properties.
    The syntax of the command is:

    hadoop jar <Name of module JAR file> RunMRSampleJob [-config <Path to configuration file>] [-debug] [-input <Path to input configuration file>] [-conf <Path to MapReduce configuration file>] [-output <Path of output directory>]

    For example, for a MapReduce MatchKeyGenerator job:

    hadoop jar amm.core.12.0.jar RunMRSampleJob -config /home/hadoop/matchkey/mkgConfig.xml -input /home/hadoop/matchkey/inputFileConfig.xml -conf /home/hadoop/matchkey/mapReduceConfig.xml -output /home/hadoop/matchkey/outputFileConfig.xml
  3. For a Spark job, use the command spark-submit.
    Based on the job you wish to run:
    1. Pass the name of the JAR file of that module.
    2. Pass the driver class's name RunSparkSampleJob.
    3. Pass the various configuration files as a list of arguments. Each argument key accepts the path of a single configuration property file, where each file contains multiple configuration properties.
    The syntax of the command is:

    spark-submit –-class RunSparkSampleJob <Name of module JAR file> [-config <Path to configuration file>] [-debug] [-input <Path to input configuration file>] [-conf <Path to Spark configuration file>] [-output <Path of output directory>]

    For example, for a Spark MatchKeyGenerator job:

    spark-submit --class RunSparkSampleJob amm.core.12.0.jar -config /home/hadoop/spark/matchkey/matchKeyGeneratorConfig.xml -input /home/hadoop/spark/matchkey/inputFileConfig.xml -output /home/hadoop/spark/matchkey/outputFileConfig.xml
Note: To see a list of argument keys supported for the hadoop or spark-submit commands, run the commands:
hadoop --help
or
spark-submit --help