Using a Joiner MapReduce Job
-
Create an instance of
DataIntegrationFactory
by using its static methodgetInstance()
. -
Provide the input and output details for the job in the
JoinDetail
instance, specifying theProcessType
as MRProcessType. Use these steps to create and configure theJoinDetail
instance.- Create an instance of
JoinDetail
by specifying theProcessType
as MRProcessType and using the default configurations. - Create separate instances of
FilePath
and for each of those, configure these input file details:RecordSeparator
(use Enum RecordSeparator),fieldSeperator
,textQualifier
, andfileHeader
(specify if the first row is to be skipped).Note:- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
- For a text input file, create an instance of
- In the
JoinDetail
instance created in the above step, configure these details:InputPaths
: Pass theFilePath
instances created and configured aboveLeftInput
: Specify the left input for the join operationJobName
: Name of the jobJoinType
: Use Enum JoinDetail.JoinType to define the join typeJoinColumns
: Specify the input columns to be joined. These should be comma separated values.OutputPath
: Use thesetOutputPath
method to set the output path of the job, specifying if the file is to be overwritten, and header is to be created.
- Create an instance of
-
To create a
MapReduce
job, use the previously created instance ofDataIntegrationFactory
to invoke its methodcreateJob()
. In this, pass theJoinDetail
instance as an argument.ThecreateJob()
method creates the job and returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
.