Overriding File Format Using a Job Property File
You can use a property file to override the file layout (or schema) of the file specified in the dataflow Read from File stage and Write to File stage. To do this, specify the following in the property file:
StageName\:schema=Protocol:SchemaFile
Where:
- StageName
-
The stage label shown under the stage's icon in the dataflow in Spectrum Enterprise Designer. Use a backslash before any spaces, colons, or equal signs in the stage name. For example, if the stage is labeled "Read from File" you would specify
Read\ from\ File
for the stage name.Embedded\ Dataflow\ 1.Embedded\ Dataflow\ 2.Write\ to\ File
To specify a stage within an embedded dataflow or a subflow, preface the stage name with the name of the embedded dataflow or subflow, followed by a period then the stage name:
EmbeddedOrSubflowName.StageName
For example, to specify a stage named Write to File in a subflow named Subflow1, you would specify:
Subflow1.Write\ to\ File
To specify a stage in an embedded dataflow that is within another embedded dataflow, add the parent dataflow, separating each with a period. For example, if Embedded Dataflow 2 is inside Embedded Dataflow 1, and you want to specify the Write to File stage in Embedded Dataflow 2, you would specify this:
Embedded\ Dataflow\ 1.Embedded\ Dataflow\ 2.Write\ to\ File
Note: You must include :file after the stage name. For example, Read\ from\ File:file. This is different from the syntax used to override files at the command line where :file is not specified after the stage name. - Protocol
- A communication protocol. One of the following:
- file
- Use the file protocol if the file is on the same machine as the Spectrum Technology Platform server. For example, on Windows specify:
"file:/C:/myfile.txt"
On Linux specify:"file:/testfiles/myfile.txt"
- esclient
- Use the esclient protocol if the file is on the computer where you are executing
the job if it is a different computer from the one running the Spectrum Technology Platform server. Use this format:
esclient:ComputerName/path to file
For example,esclient:mycomputer/testfiles/myfile.txt
Note: If you are executing the job on the server itself, you can use either the file or esclient protocol, but are likely to have better performance using the file protocol.If the host name of the Spectrum Technology Platform server cannot be resolved, you may get the error "Error occurred accessing file". To resolve this issue, open this file on the server: SpectrumDirectory/server/conf/spectrum-container.properties. Set the spectrum.runtime.hostname property to the IP address of the server. - esfile
- Use the esfile protocol if the file is on a file server. The file server must be
defined in Spectrum Management Console as a resource. Use this format:
esfile://file server/path to file
For example,esfile://myserver/testfiles/myfile.txt
Where myserver is an FTP file server resource defined in Spectrum Management Console. - webhdfs
- Use the webhdfs protocol if the file is on a Hadoop Distributed File Server. The
HDFS server must be defined in Spectrum Management Console as a resource. Use this format:
webhdfs://file server/path to file
For example,webhdfs://myserver/testfiles/myfile.txt
Where myserver is an HDFS file server resource defined in Spectrum Management Console.
- SchemaFile
-
The full path to the file that defines the layout you want to use.
Note: You must use forward slashes in file paths. Do not use backslashes.To create a schema file, define the layout you want in Read from File or Write to File, then click the Export button to create an XML file that defines the layout.
Note: You cannot override a field's data type in a schema file when using job executor. The value in the <Type> element, which is a child of the FieldSchema element, must match the field's type specified in the flow's Read from File or Write to File stage.
Example
In the following example properties file, the last line overrides the file layout defined in the Read from File stage with the layout defined in the file inputSchema.xml. A backslash is used before the spaces in the stage's name.
j=testJob
h=myspectrumserver.example.com
s=8080
u=david1234
p=mypassword1234
Read\ from\ File\:file=esclient:c:/MyData/testInput.txt
Read\ from\ File\:schema=esclient:c:/MyData/inputSchema.xml