Using a Groovy Script MapReduce Job

  1. Create an instance of DataIntegrationFactory by using its static method getInstance().
  2. Provide the input and output details for the GroovyScript job by creating an instance of CustomGroovyScriptDetail specifying the ProcessType. The instance must use the type MRProcessType. Use these steps to create and configure the CustomGroovyScriptDetail instance.
    1. Create an instance of CustomGroovyScriptDetail by specifying the ProcessType as MRProcessType. To this instance, set these details:
      • Input file: Use the inputPath field
        Note:
        • For a text input file, create an instance of FilePath with the relevant details of the input file by invoking the appropriate constructor.
        • For an ORC input file, create an instance of OrcFilePath with the path of the ORC input file as the argument.
        • For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
      • Output file: Use the outputPath field
        Note:
        • For a text output file, create an instance of FilePath with the relevant details of the output file by invoking the appropriate constructor.
        • For an ORC output file, create an instance of OrcFilePath with the path of the ORC output file as the argument.
        • For a parquet output file, create an instance of ParquetFilePath with the path of the parquet output file as the argument.

      • Name of the job: Use the jobName field
      • Date pattern: M/d/yy
      • Date-time pattern: M/d/yy h:mm a
      • Time pattern: h:mm a
    2. Create an instance of CustomGroovyScriptConfiguration and set these details to it:
      • The groovyScriptFile
      • InputFields
      • OutputFields
    3. Create a configuration by using the getScriptTransformerConfiguration() method, which calls the list of CustomGroovyScriptConfiguration instances created and configured above.
  3. To create a MapReduce job, use the previously created instance of DataIntegrationFactory to invoke its method createJob(). In this, pass the above instance of CustomGroovyScriptDetail as an argument.
    The createJob() method creates the job and returns a List of instances of ControlledJob.
  4. Run the created job using an instance of JobControl.