Creating a Java Application

Ensure the Spectrum™ Data & Address Quality for Big Data SDK is installed on your machine.

To use the SDK:

  1. Create a Java project to use the SDK as required using one of these methods:
    1. Create a specific Java project to run the required Data Quality operation.
      Using this method, you'll need to create separate Java projects for each Data Quality job you wish to run.
    2. Create a common Java project to run any of the desired Data Quality operations using the corresponding runtime arguments.
      Using this method, you'll need to create just one Java project which accepts runtime arguments corresponding to the desired Data Quality operation.
  2. Import the Spectrum™ Data & Address Quality for Big Data SDK module-specific JAR file into your project to use the SDK. For a list of the module-specific JAR files, see Components of the SDK Java API.
  3. Import the required Hadoop JAR files into your project.
  4. Create your application to run the desired Data Quality jobs, with appropriate configurations.
  5. Build your project, using any build tool like Maven or Ant.
    A JAR file of your project is created as a result.

    For example, MatchKeyGeneratorClient-with-dependencies.jar is created.

  6. Place your project's JAR file on the Hadoop platform.
  7. On the Hadoop platform, in a command prompt, change the directory to the path where you have placed your JAR file.
  8. Run the JAR of your project using the command:
    hadoop jar <name of the JAR of your client project> <fully qualified name of the main class>
    For example:
    hadoop jar MatchKeyGeneratorClient-with-dependencies.jar com.company.bdq.amm.mr.MatchKeyGeneratorJob
The desired job is created and executed on the Hadoop platform.

Your Java application accesses the input data from the path specified on the Hadoop platform, and creates and runs the job on the Hadoop platform. The output of the job is dumped into a file at the specified output path on the Hadoop platform.