Using an Advanced Transformer MapReduce Job
-
Create an instance of
DataNormalizationFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Advanced Transformer job by creating an
instance of
AdvancedTransformerDetail
specifying theProcessType
. The instance must use the type MRProcessType.-
Configure the advanced transfomer rules by creating an instance of
AdvancedTransformerConfiguration
. Within this instance:Add an instance of typeAbstractAdvancedTransformerRules
. ThisAbstractAdvancedTransformerRules
instance must be defined using one of these classes:TableDataExtraction
orRegularExpressionExtraction
, corresponding to the desired advanced transfomer rule category. -
Set the details of the Reference Data path and location type by
creating an instance of
ReferenceDataPath
. See Enum ReferenceDataPathLocation. -
Create an instance of
AdvancedTransformerDetail
, by passing an instance of typeJobConfig
, and theAdvancedTransformerConfiguration
andReferenceDataPath
instances created earlier as the arguments to its constructor.TheJobConfig
parameter must be an instance of type MRJobConfig. -
Set the details of the input file using the
inputPath
field of theAdvancedTransformerDetail
instance.For a text input file, create an instance ofFilePath
with the relevant details of the input file by invoking the appropriate constructor. For an ORC input file, create an instance ofOrcFilePath
with the path of the ORC input file as the argument. -
Set the details of the output file using the
outputPath
field of theAdvancedTransformerDetail
instance.For a text output file, create an instance ofFilePath
with the relevant details of the output file by invoking the appropriate constructor. For an ORC output file, create an instance ofOrcFilePath
with the path of the ORC output file as the argument. -
Set the name of the job using the
jobName
field of theAdvancedTransformerDetail
instance.
-
Configure the advanced transfomer rules by creating an instance of
-
To create a MapReduce job, use the previously created instance of
DataNormalizationFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofAdvancedTransformerDetail
as an argument.ThecreateJob()
method returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
. -
To display the reporting counters post a successful MapReduce job run, use the
previously created instance of
DataNormalizationFactory
to invoke its methodgetCounters()
, passing the created job as an argument.