Using a Joiner Spark Job
-
Create an instance of
DataIntegrationFactory
by using its static methodgetInstance()
. -
Provide the input and output details for the job in the
JoinDetail
instance, specifying theProcessType
as SparkProcessType. Use these steps to create and configure theJoinDetail
instance.- Create an instance of JoinDetail by specifying the ProcessType as SparkProcessType and using the default configurations.
- Create separate instances of
FilePath
and for each of those, configure these input file details:RecordSeparator
(use Enum RecordSeparator),fieldSeperator
,textQualifier
, andfileHeader
(specify if the first row is to be skipped).Note: For a text input file, create an instance ofFilePath
with the relevant details of the input file by invoking the appropriate constructor. For an ORC input file, create an instance ofOrcFilePath
with the path of the ORC input file as the argument. - In the
JoinDetail
instance created in the above step, configure these details:InputPaths
: Pass theFilePath
instances created and configured aboveLeftInput
: Specify the left input for the join operationJobName
: Name of the jobJoinType
: Use Enum JoinDetail.JoinType to define the join typeJoinColumns
: Specify the input columns to be joined. These should be comma separated values.OutputPath
: Use thesetOutputPath
method to set the output path of the job, specifying if the file is to be overwritten, and header is to be created.
-
To create a Spark job, use the previously created instance of
DataIntegrationFactory
to invoke its methodrunSparkJob()
. In this, pass theJoinDetail
instance as an argument.TherunSparkJob()
method creates the job and returns amap
of instances ofControlledJob
.