Enterprise Data Integration

These stages are available to you if you have licensed for the Enterprise Data Integration module. Use these to connect to data in multiple sources either directly or through integration with your existing data access technologies.

Stage Icon Description
DB Loader

The DB Loader stage allows you to access and load data from and to the configured databases. This stage provides an interface to a high-speed data loading utility. Currently, the platform supports Oracle Loader, DB2 Loader, PostgreSQL Loader, and Teradata Loader

Field Selector

Use the Field Selector stage to choose the fields that are to be passed to the next stage in the dataflow and remove the unwanted ones. For example, if you have created a new field by combining the data from two fields, and you no longer need the two source fields, you can use the Field Selector to retain only the new field and remove the two source fields from the dataflow.

Generate Time Dimension

Generate Time Dimension creates date records, one for each day of the date range you specify. You can then write these records to a time dimension table in a database using the Write to DB stage. The time dimension table can then be used to perform accurate calculations for time period.

Query DB

The Query DB stage allows you to use fields as parameters into a database query and return the results of the query as new fields in the dataflow.

Read from DB

The Read From DB stage reads data from a database table/view as input to a dataflow. The stage is available for jobs, services, and subflows but not for process flows.
Note: The stage supports reading data from and writing data to HDFS 3.x and Hive 2.1.1. The support includes:
  • Connectivity to Hive from Spectrum on Windows
  • Support and connectivity to Hive version 2.1.1 from Spectrum with high availability
  • Support to Read and Write from Hive DB (JDBC) via Model Store connection

Also see Best Practices for connecting to HDFS 3.x and Hive 2.1.1.

Read from Spreadsheet

The Read from Spreadsheet stage reads data from a spreadsheet, for example, an Excel spreadsheet, as input to a dataflow. It supports these two formats:
  • .xls
  • .xlsx
SQL Command

SQL Command runs one or more SQL commands for each record in the data flow. You can use SQL Command to:
  • Run complex INSERT/UPDATE statements, such as statements that have subqueries/joins with other tables.
  • Update tables after inserting/updating data to maintain referential integrity.
  • Update or delete a record in a database before a replacement record is loaded.
  • Update multiple tables in a single transaction.
Write to Spreadsheet

The Write to Spreadsheet stage writes data to a spreadsheet, such as an Excel spreadsheet, as output from a dataflow.

Write to DB

The Write to DB stage writes the output of a dataflow to a database. The stage writes all values of the date datatype as String values. This is the behavior of the jTDS driver, which is the default driver used by Spectrum. To handle all date datatype values as is, use Microsoft's JDBC driver.
Note: The stage supports reading data from and writing data to HDFS 3.x and Hive 2.1.1. The support includes:
  • Connectivity to Hive from Spectrum on Windows
  • Support and connectivity to Hive version 2.1.1 from Spectrum with high availability
  • Support to Read and Write from Hive DB (JDBC) via Model Store connection

Also see Best Practices for connecting to HDFS 3.x and Hive 2.1.1.