Read from Spreadsheet

The Read from Spreadsheet stage reads data from a spreadsheet, for example, an Excel spreadsheet, as input to a dataflow. It supports these two formats:
  • .xls
  • .xlsx

Configuring the options

On the Read from Spreadsheet Options: Read from Spreadsheet page, configure these options:
Option Description
File name It specifies the path to the file. Select button to browse your file.
Note: If the Spectrum Technology Platform server is running on Linux, remember that the file names and paths on this platform are case sensitive.
Data selection

It specifies how you want to select data from the spreadsheet to read into the dataflow:

Sheet Data
Select this option to read in all the data from a sheet in the spreadsheet.
Range Data
Select this option to read in a subset of data from a sheet by specifying a range of cells to read.
Named Range
Select this option to read in a subset of data from a sheet by specifying a named range from the spreadsheet.
Sheet selection If you choose Sheet Data or Range Data in the Data selection field, use this option to select the sheet from which you want to read data into the dataflow.
Range If you choose Range Data in the Data selection field, use this option to specify the cell that starts the range and the cell that ends the range.
Range name
If you choose Named Range in the Data selection field, use this option to specify the name of the range you want to read into the dataflow.
Note: Ranges are defined in the spreadsheet. If no ranges are listed in the Range name field, it means that no ranges are defined in the spreadsheet.
Header row

Select this box to specify a row that contains column headings. Column headings become the dataflow field names, although you can change field names in File Schema.

If you do not select this checkbox, the dataflow fields are given generic default names such as Column1 and Column2.

The header row you specify is relative to the data selection. For example, if you choose Range Data in the Data selection field and the range begins on the fifth row, and you specify 1 as the header row, then the fifth row in the spreadsheet will be used as the header because the fifth row of the spreadsheet is the first row of the range.

Data offset from header If you specify a header row, this field specifies the first row that contains data relative to the header.

For example, if you specify 1, the first row below the header will be the first row of data to be read into the dataflow. If you specify 2, the second row below the header will be the first row read into the dataflow.

First data row If you do not specify a header row, this option specifies which row within the data selection contains the first row of data to read into the dataflow. The row you specify is relative to the data selection.

For example, if you choose Range Data in the Data selection field and the range begins on the fifth row, and you specify 1 as the first data row, then the first row of data to be read into the dataflow will be the fifth row.

Ignore empty rows

Select this option if you want to exclude empty rows from the dataflow.

If you do not select this option, empty rows in the spreadsheet will result in empty records in the dataflow.

Note: This option does not affect the data shown in the preview. Empty rows are always shown in the preview, even if this option is selected.
Regenerate Select this button to populate the File Schema with the fields in the input file.
Detect Type Select this button to determine the data type for all the fields automatically.
Note: You can manually change a field's data type by selecting the Type from the drop-down in File Schema.
Refresh Select this button to preview your schema. The preview you see is based on the options you configure for your spreadsheet.