Using a Joiner Spark Job
-
Create an instance of
DataIntegrationFactory
by using its static methodgetInstance()
. -
Provide the input and output details for the job in the
JoinDetail
instance, specifying theProcessType
as SparkProcessType. Use these steps to create and configure theJoinDetail
instance.- Create an instance of JoinDetail by specifying the ProcessType as SparkProcessType and using the default configurations.
- Create separate instances of
FilePath
and for each of those, configure these input file details:RecordSeparator
(use Enum RecordSeparator),fieldSeperator
,textQualifier
, andfileHeader
(specify if the first row is to be skipped).Note:- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
- For a text input file, create an instance of
- In the
JoinDetail
instance created in the above step, configure these details:InputPaths
: Pass theFilePath
instances created and configured aboveLeftInput
: Specify the left input for the join operationJobName
: Name of the jobJoinType
: Use Enum JoinDetail.JoinType to define the join typeJoinColumns
: Specify the input columns to be joined. These should be comma separated values.OutputPath
: Use thesetOutputPath
method to set the output path of the job, specifying if the file is to be overwritten, and header is to be created.
-
To create a Spark job, use the previously created instance of
DataIntegrationFactory
to invoke its methodrunSparkJob()
. In this, pass theJoinDetail
instance as an argument.TherunSparkJob()
method creates the job and returns amap
of instances ofControlledJob
.