Data Settings

The Data Settings page allows you to view and edit settings that characterize a dataset.

This settings page is displayed when you edit a dataset. Settings configured in the Data Settings section characterize the data. The Preview section displays the data correctly if the settings match the record format for the data.

Data Settings

Character encoding
Specifies the character encoding used in the source file.
Field delimiter
Specifies the character that separates values in a record.
Text qualifier
Specifies the character that encloses values that contain the delimiter.
Line separator
Specifies whether the file uses Windows or Unix line breaks.
Note: Databricks does not support the Windows line separator. To run a job in a Databricks environment, you must use source data with Unix line breaks.
First row is a header row
Select this check box if the first row contains column labels. Clear the check box if the first row contains data.

Field definition table

Settings in this table characterize fields in the dataset.

Field Name
This value specifies the field name. Field names must start with an alphabetic character. They may contain alphanumeric, dash (-) and underscore (_) characters. Spaces and other non-alphanumeric characters are not permitted.
If the First row is a header row check box is selected, this column initially shows values from columns in the first row of the input data . If the first row contains data, clear the First row is a header row check box, and enter a field name for each column in the table. The default column names if you clear the First row is a header row check box are Column_1, Column_2, and so forth. You may also edit field names if the First row is a header row check box is selected. Any changes appear here and in the header row in the Preview table.
Data Type
Entries in this column define the primitive data type for a field. Choose the appropriate type corresponding to data in a field. You can choose from Boolean, Float, Integer, Long, String, Date, Time, or DateTime. When you create a dataset, the data type is initially set for each field based that matches data type formats for the dataset. You can click to Edit data type formats button to change the default data type formats for a dataset.
Semantic Type
Semantic types describe the kind of information that data represents. For example, a field with a float type may semantically represent a currency, and a field with a string type may semantically represent a city. When you edit a pipeline, this setting determines which transforms show in the Transforms Suggestions.

Preview

The Preview table displays sample data from the data source. Field names and data will display correctly in separate columns if the data settings match the data format. If the data does not display correctly you can adjust the data settings to match the data. For example, data will appear in a single column if the Period (.) is selected for Field delimiter if data is uploaded as comma separated values.

The header row above the table shows column headings. If the First row is header row check box is selected, the header row initially shows values from the first row of the input data. If you the First row is header row check box, it initially shows Column_1, Column_2, and so forth.