Write to DB

The Write to DB stage writes the output of a dataflow to a database. The stage writes all values of the date datatype as String values. This is the behavior of the jTDS driver, which is the default driver used by Spectrum. To handle all date datatype values as is, use Microsoft's JDBC driver.
Note: The stage supports reading data from and writing data to HDFS 3.x and Hive 2.1.1. The support includes:
  • Connectivity to Hive from Spectrum on Windows
  • Support and connectivity to Hive version 2.1.1 from Spectrum with high availability
  • Support to Read and Write from Hive DB (JDBC) via Model Store connection

Also see Best Practices for connecting to HDFS 3.x and Hive 2.1.1.

Using the Write to DB Stage

To use this stage:
  1. On the Spectrum Flow Designer Home page, click New.
  2. On the New Flow page, click Job, Service, or Subflow, as required and then click the corresponding blank canvas.
  3. Click OK.
  4. In the dialog box that appears, give a name to the Flow, Job, Service, or Subflow you are creating.
  5. Click OK.
  6. From the Palette Panel, drag the Write to DB stage to the canvas.
  7. Connect the required Sources and Sinks to the stage and configure those.
    Note: For details on how to configure the Sources and the Sinks stages, see the respective sections.
  8. Click Write to DB and configure the stage as described in the following sections.

Configuring the General tab

Configure the General tab of the stage, as described in the table below:
Field Description
Connection From the drop down list, select the connection for the database you want to use.
The dropdown displays all the connections you have configured. To configure a new connection or edit any existing connections, click the Manage Connections link. It takes you to the Connections page in the Spectrum Management Console. For details on configuring new connections, see Connections.
Note: Click the Refresh icon to update any modification, deletion or addition to your list of connections.
Table/View Specify the table to which you want to write the data.
  1. Click the Browse button ([...]). The Select Table pop-up window is displayed, which shows a list of the tables and views.
  2. Choose the required table.
    Note: You can also use the search text box on the top of the pop-up window to narrow down your search.
  3. Click Select.
The fields of the selected table are displayed in the Fields section on the right side of the page. These are the details displayed:
  • Database Fields: This column lists the field names in the database. You cannot modify these field names.
  • Stage Fields: This column lists the field names used in the data flow. You cannot modify these field names
  • Types: This column lists the data type of each field.
Create Table: The Create Table button supports table creation in these databases:
  • Axion
  • DB2
  • Derby or Cloudscape
  • Firebird
  • HSQLDB
  • Interbase
  • MaxDB or SapDB
  • McKoi
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Sybase
Note: For DB2 databases, if you try to create a table and the page size is smaller than the total length of all string columns, you will get an error that says "Failed to build body from content. Serializable class not available to broker."
To create a new table in the database:
  1. Click Create Table and in the pop-up that appears, select the Table Owner and specify the Table Name.
    Note: Table names are case sensitive
    Note: If you do not have an input stage (such as Read from File or Read from DB) linked to the Write to DB stage, you will get this error message: Cannot create table without Table schema defined. Please make sure you have upstream fields defined for this stage.
  2. In the Table Schema, specify these details:
    1. Indicate the primary key by selecting the corresponding Primary key check box.
    2. Select the Include check box to specify the fields you want to write to the new table.
    3. For string data type, specify the fields's length in the Width column.
      Note: The default is 512.
    4. If Allow Null is checked and the Input Fields contains a null value, then the dataflow writes the null value to the database.
    5. You can edit the column name by changing the value in the corresponding Output Fields.
  3. Click the OK button to close the Create Table pop-up and return to Write to DB Options.
  4. In the Stage Fields column of the Table Schema, you can specify the field name you want to write to the database corresponding to the Database Field column.
  5. Mark the Include check box to select the fields you want to write.
    Note: To prevent poor performance you should have a sorted index or key in the database table.

Configuring the Runtime tab

Option Name

Description

Write Mode

Specify the action to take when writing to the database:

Insert
Inserts new records into the database but does not update existing records. This is the default setting.
Update
Updates existing records in the database but does not insert new records.
Note: If you select Update, the primary key column name used in the input table must match the primary key column name in the output table. If you try to update a table where the primary key column name does not match the input, or where the primary key column is not defined, the update will not work.
Insert if not able to update
Inserts new records into the database if the record do not exist, otherwise updates the existing record.

Batch commit

Select this option to commit changes to the database after a specified number of records are processed. By default this option is not selected, which means that changes are committed after each record is processed. Selecting this option can significantly improve the performance of the Write to DB stage.

Batch size

If you enable the Batch commit option, this specifies the number of records to commit to the database in each batch. The default is 1,000. For dataflows created in Spectrum Technology Platform 7.0 and earlier, the default is 100.

A larger batch size does not always offer better load performance. Consider these factors when choosing a batch size:

  • Data arrival rate to Write To DB stage: If data is arriving at slower rate than the database can process then modifying batch size will not improve overall dataflow performance. For example, dataflows with address validation or geocoding may not benefit from an increased batch size.
  • Network traffic: For slow networks, increasing batch size to a medium batch size (1,000 to 10,000) will result in better performance.
  • Database load and/or processing speed: For databases with high processing power, increasing batch size will improve performance.
  • Multiple runtime instances: If you use multiple runtime instances of the Write to DB stage, a large batch size will consume a lot of memory, so use a small or medium batch size (100 to 10,000).
  • Database roll backs: Whenever a statement fails, the complete batch is rolled back. The larger the batch size, the longer it will take to perform the to rollback.
Commit at the end Select this option to ensure that the commit to database operation occurs after all the records are transferred to the database.
Batch count to commit Specify a value after which the records are to be committed. Records are committed to the database after every (batch count to commit * batch size) number of records are transferred to the database. For example, if Batch size is set as 1000 and Batch count to commit is set as 3, then the commit occurs after every 3000 records are transferred to the database.

Truncate table before inserting data

Select this option if you want to clear all data from the table before writing to the database.

Drop and recreate the table if it already exists

Select this option to delete and recreate the table before writing the dataflow's output to the table. This option is useful if you want the table's schema to match the fields from the dataflow and not contain any extraneous schema information.

The table that will be deleted and recreated is the one specified in the Table/View field on the General tab. For example, if you specify the Customers table in the Table/View field, and you select Drop and recreate the table if it already exists, then the Customers table will be deleted from the database, and a new table named Customers will be created with a schema that matches the actual fields written to the table.