Read From File

The Read from File stage specifies an input file for a job or subflow. It is not available for services.

Note: If you want to use an XML file as input for your dataflow, use the Read from XML stage instead of Read from File. If you want to use a variable format file as input, use Read from Variable Format File.
Prerequisite: To read a file from any of the file system connection types, such as FTP, Cloud, Amazon AWS S3, and HDFS, perform these steps:
  1. Create a connection to these file servers using Spectrum Management Console or Discovery. For details, see section Defining Connections.
  2. Select the file using the File name field in File Properties tab (described below).

File Properties Tab

Field Name Description
Server name Indicates whether the file you select as input is located on the computer running Spectrum Enterprise Designer or on the Spectrum Technology Platform server. If you select a file on the local computer, the server name will be My Computer. If you select a file on the server the server name will be Spectrum Technology Platform.
File name Specifies the path to the file. Click the ellipses button (...) to go to the file you want.

You can read multiple files by using a wild card character to read data from multiple files in the directory. The wild card characters * and ? are supported. For example, you could specify *.csv to read in all files with a .csv extension located in the directory. In order to successfully read multiple files, each file must have the same layout (the same fields in the same positions). Any record that does not match the layout specified on the Fields tab will be treated as a malformed record.

While reading a file from an HDFS file server, the compression formats supported are:

  1. GZIP (.gz)
  2. BZIP2 (.bz2)
Note: The extension of the file indicates the compression format to be used to decompress the file.
Attention: If the Spectrum Technology Platform server is running on Linux, remember that file names and paths on these platforms are case sensitive.
Record type The format of the records in the file. Select one of:
Line Sequential
A text file in which records are separated by an end-of-line (EOL) character such as a carriage return or line feed (CR or LF) and each field has a fixed starting and ending character position.
Fixed Width
A text file in which each record is a specific number of characters in length and each field has a fixed starting and ending character position.
Delimited
A text file in which records are separated by an end-of-line (EOL) character such as a carriage return or line feed (CR or LF) and each field is separated by a designated character such as a comma.
Character encoding The text file's encoding. Select one of these:
CP1252
This encoding is also known as the Windows-1252 or simply Windows character set. It is a super set of ISO-8859-1 and uses the 128-159 code range to display additional characters not included in the ISO-8859-1 character set.
UTF-8
Supports all Unicode characters and is backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
UTF-16
Supports all Unicode characters but is not backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
US-ASCII
A character encoding based on the order of the English alphabet.
UTF-16BE
UTF-16 encoding with big endian byte serialization (most significant byte first).
UTF-16LE
UTF-16 encoding with little endian byte serialization (least significant byte first).
ISO-8859-1
An ASCII character encoding typically used for Western European languages. Also known as Latin-1.
ISO-8859-3
An ASCII character encoding typically used for Southern European languages. Also known as Latin-3.
ISO-8859-9
An ASCII character encoding typically used for Turkish language. Also known as Latin-5.
CP850
An ASCII code page used to write Western European languages.
CP500
An EBCDIC code page used to write Western European languages.
Shift_JIS
A character encoding for the Japanese language.
MS932
A Microsoft's extension of Shift_JIS to include NEC special characters, NEC selection of IBM extensions, and IBM extensions.
CP1047
An EBCDIC code page with the full Latin-1 character set.
Field separator Specifies the character used to separate fields in a delimited file. For example, this record uses a pipe (|) as a field separator:
7200 13TH ST|MIAMI|FL|33144

These characters available to define as field separators are:

  • Space
  • Tab
  • Comma
  • Period
  • Semicolon
  • Pipe

If the file uses a different character as a field separator, click the ellipses button to select another character as a delimiter.

Text qualifier

The character used to surround text values in a delimited file.

For example, this record uses double quotes (") as a text qualifier.

"7200 13TH ST"|"MIAMI"|"FL"|"33144"

The characters available to define as text qualifiers are:

  • Single quote (')
  • Double quote (")

If the file uses a different text qualifier, click the ellipses button to select another character as a text qualifier.

Record separator

Specifies the character used to separate records in line a sequential or delimited file. This field is not available if you check the Use default EOL check box.

The record separator settings available are:

Linux (U+000A)
A line feed character separates the records. This is the standard record separator for Linux systems.
Macintosh (U+000D)
A carriage return character separates the records. This is the standard record separator for Macintosh systems.
Windows (U+000D U+000A)
A carriage return followed by a line feed separates the records. This is the standard record separator for Windows systems.

If your file uses a different record separator, click the ellipses button to select another character as a record separator.

Use default EOL

Specifies that the file's record separator is the default end of line (EOL) character used on the operating system on which the Spectrum Technology Platform server is running.

Do not select this option if the file uses an EOL character that is different from the default EOL character used on the server's operating system. For example, if the file uses a Windows EOL but the server is running on Linux, do not check this option. Instead, select the Windows option in the Record separator field.

Record length

For fixed width files, specifies the exact number of characters in each record.

For line sequential files, specifies the length, in characters, of the longest record in the file.

First row is header record

Specifies whether the first record in a delimited file contains header information and not data.

For example, this file snippet shows a header row in the first record.

"AddressLine1"|"City"|"StateProvince"|"PostalCode"
"7200 13TH ST"|"MIAMI"|"FL"|"33144"
"One Global View"|"Troy"|"NY"|12180
Treat records with fewer fields than defined as malformed

Delimited file records containing fewer fields than are defined on the Fields tab will be treated as malformed.

Import

Imports the file layout definition, encoding setting, and sort options from a settings file. The settings file is created by exporting settings from another Read from File or Write to File stage that used the same input file or a file that has the same layout as the file you are working with.

Export

Saves the file layout definition, encoding setting, and sort options to a settings file. You can then import these settings into other Read from File or Write to File stages that use the same input file or a file that has the same traits as the file you are working with now. You can also use the settings file with job executor to specify file settings at runtime.

For information about the settings file, see The File Definition Settings File.

Fields Tab

The Fields tab defines the names, positions, and, for fixed width and line sequential files, lengths of fields in the file. For more information, see these topics:

Sort Fields Tab

The Sort Fields tab defines fields by which to sort the input records before they are sent into the dataflow. Sorting is optional. For more information, see Sorting Input Records.

Runtime Tab

Field Name Description
File name

Displays the file name selected in the first tab.

Starting record

If you want to skip records at the beginning of the file when reading records into the dataflow, specify the first record you want to read. For example, if you want to skip the first 50 records, in a file, specify 51. The 51st record will be the first record read into the dataflow.

All records

Select this option if you want to read all records starting from the record specified in the Starting record field to the end of the file.

Max records

Select this option if you want to only read in a certain number of records starting from the record specified in the Starting record field. For example, if you want to read the first 100 records, select this option and enter 100.