Replicate Data using the Replication Wizard

Within Data Integration, the Replication Setup wizard will guide you step-by-step to:

Create a pipeline that defines how copy and/or replication will be done later.
Add, monitor, and maintain a centralized repository of connections, pipelines, projects, metabases, and runtime engines.
View summary information about objects.
Monitor, track, and troubleshoot replication projects.

General tab

Specifies the general information such as replication pipeline type and options.

Connections

Data Integration gives you the tools to configure and manage the connections.

Add at least one data connection for each type of data source and target database, file system, or Kafka streaming service you plan to connect to. When a data connection is created, the runtime server is selected from the list.
Add pipelines and associate each pipeline in a project to a source data connection and a target data connection.
Run or start your Data Integration project, which contains a collection of pipelines that are processed against tables in your data set.

Use data connections to access both data source and target systems. Connections configured to access Apache Kafka are only valid for these targets and cannot be used to connect to data sources.

Add Source Connection

Before pipelines can run as part of a project, you must add a data connection for each data source and target database. You also select the associated access method (for example, ODBC or JDBC). When you add a data connection, it is saved to the repository and is available to use when you create a pipeline.

Source Connection Guidelines

When adding data connections for relational databases, ensure that the relational source files are properly configured before you run a project.

When you configure a connection to a database management system (DBMS), you are prompted to supply a user name and password to authenticate your connection. Therefore, only you will have access to the DBMS data connections you add.
You can test the connection as you add it. This allows you to confirm that the values you enter are valid and you have the proper credentials to access databases, directories, and file systems.
Agents are change capture processes that reside on the z/OS server. Before adding a Db2 for z/OS JDBC connection, ensure that you have configured and started one or more capture agents on the z/OS server because you will be prompted to select or specify an agent during the procedure.

Click Add Source. The New Data Connection dialog opens.
For Data connection, enter a unique name for the data connection. Note the following guidelines for connection names:
- If you enter a name that already exists in the repository, you cannot save the connection.
- The name cannot exceed 200 characters.
- For replication data connections, spaces in the name are not supported and hyphens ( - ) are the only special characters supported.
- The name cannot be changed after the connection is saved.
(Optional) For Description, enter a brief description of the data connection.

For Connection type, click the drop-down list and select the type of connection you are adding. If you select a DBMS, specify the associated data Access method.

See available access methods for each connection type:


Connection Type	Access Method	Supported Project Type
Db2 for IBM i	JDBC	Replicate
Db2 for IBM Z	JDBC	Replicate
Oracle	JDBC	Replicate

Specify properties, depending on your connection type:

Db2 for IBM i


Db2 for IBM i Properties	Description
Runtime Engines	Do one of the following: Click the Server drop-down list and select the name of the runtime server that will access the database. Only servers configured for replication projects are available. Click Add Engine to add a new runtime engine. The name of the engine you add will then display in the server drop-down list.
Database	Specify the following database properties: Database location. To specify the server where the database is located, click the drop-down list and select one of the following: Remote server (the default) Runtime server Installation library. The name of the installation library. This library is the location where the log reader is installed. Enable IASP. To enable Independent Auxiliary Storage Pool (IASP), click the drop-down list and select Yes. By default No is selected. An IASP is a collection of disk units which contains a set of user objects and the necessary system data (for example, storage management directories, object ownership, and authorization information) so that the IASP can be taken offline or brought online independently from system activity on other ASPs. When you select Yes, enter values for the following: IASP name. The name of the IASP. Do not enter SYSBAS for the IASP name. RDB name. The name of the relational database. This field is automatically populated with the name you entered for the IASP. Accept this name or enter a different name. Do not* enter SYSBAS for the RDB name. DBMS version. The version of the DBMS. This read-only value is populated each time a connection to the database is made. For example, when you click Test* to test the connection. Otherwise, a dash (-) displays. JDBC driver version. The version of the JDBC driver on the database server. This read-only value is populated each time a connection to the database is made. Otherwise, a dash (-) displays. (When the database server name and default runtime server name are same, it is on-box. Otherwise, off-box.) GMT offset. The number of minutes that the local time on the database server differs from Greenwich Mean Time (GMT). This read-only value is populated each time a connection to the database is made. Otherwise, a dash (-) displays. Test user ID and Password. Enter your database user ID and password credentials to allow access to test the connection to the database server and database. These values are not saved.

Db2 for z/OS


Db2 for z/OS Properties	Description	Access Method
Default Runtime Server	Do one of the following: Click the Server drop-down list and select a server name. When configuring a copy project pipeline that uses the connection, the runtime server will access the database to retrieve the schema(s) and table names. Click Add a Server to add a new server. The Add Server window opens. The name of the new server you add will then display in the server drop-down list.	Native and ODBC
Runtime Server	Do one of the following: Click the Server drop-down list and select the name of the runtime server that will access the database. Only servers configured for replication projects are available. Click Add a Server to add a new replication server. The Add Server window opens. The name of the new server you add will then display in the server drop-down list.	JDBC
Database	Specify the following database properties: Database. Specify the database name or the ODBC data source name. User ID and Password. Enter your user name and password authentication credentials for the database to which you want to connect. Passwords can be specified as variables. Repository password (optional). Select if the password (or secret key) you entered is stored in a secure repository. Note: Configure secure passwords or secret keys in the Connect application. See the topic "Secure Repository passwords" in the Connect help.	Native and ODBC
Database	Specify the following database properties: Database server hostname or IP address. Enter the DNS/hostname or IP address of the Db2 for z/OS database server. If you enter an IP address, ensure the value conforms to Internet Protocol 4 (IPv4) or 6 (IPv6). Database. Enter the name of the database on the database server. Port number. Enter the port number for the database on the database server. Enter a value between 1 and 65535. The default is 448. DBMS version. The version of the DBMS. This read-only value is populated each time a connection to the database is made. For example, when you click Test to test the connection. Otherwise, a dash (-) displays. JDBC driver version. The version of the JDBC driver on the database server. This read-only value is populated each time a connection to the database is made. Otherwise, a dash (-) displays. GMT offset. The number of minutes that the local time on the database server differs from Greenwich Mean Time (GMT). This read-only value is populated each time a connection to the database is made. Otherwise, a dash (-) displays. User ID and Password. Enter your database user ID and password credentials to allow access to test the connection to the database server and database.	JDBC
Capture	Specify the following capture properties: Server. Enter the name of the z/OS server where the capture processes reside. Port number. Enter the port number for the capture server. Enter a value between 1 and 65535. The default is 2626. Private key path. Enter the location and name of your private key. Agent. Agents are change capture processes that reside on the z/OS server. Click the drop-down list to select an agent. Ensure you have first selected a server and port number. If the list of agents cannot be retrieved, specify the name of an existing agent in the text field to the right of the drop-down list. Note: Agents cannot be used by multiple connections, so ensure the agent you select or specify is not already used. When you change your server, port, and private key name and location, you are prompted again to select an agent.	JDBC
JCL Configuration	Specify the following JCL properties: Job card. Enter the parameter that represents the job or unit of work that the operating system will perform. For example, //DMXCDC JOB ,MSG,REGION=64M. Note: The parameter statement directs the system to allocate the correct scheduler and the required CPU time, and to issue notifications. Job library. Enter the JCL data definition (DD) statement identifying the LOADLIB path where DMXCDC is installed on the z/OS system required to run the JCL job. For example, //JOBLIB DD DISP=SHR,DSN=DMXCDC.LOADLIB. Note: If you do not specify a job library, z/OS searches public libraries only. ISPF library. Enter the path where the DMXCDC Interactive System Productivity Facility (ISPF) libraries are installed. For example, DMXCDC.ISPFLIB.	JDBC

Oracle


Oracle Properties	Description	Access Method
Default Runtime Server	Do one of the following: Click the Server drop-down list and select a server name. When configuring a copy project pipeline that uses the connection, the runtime server will access the database to retrieve the schema(s) and table names. Click Add a Server to add a new server. The Add Server window opens. The name of the new server you add will then display in the server drop-down list.	Native
Runtime Server	Do one of the following: Click the Server drop-down list and select the name of the runtime server that will access the database. Only servers configured for replication projects are available. Click Add a Server to add a new replication server. The Add Server window opens. The name of the new server you add will then display in the server drop-down list.	JDBC
Database	Specify the following database properties: Database. Specify the database name. User ID and Password. Enter your user name and password authentication credentials for the database to which you want to connect. Passwords can be specified as variables. Repository password (optional). Select if the password (or secret key) you entered is stored in a secure repository. Note: Configure secure passwords or secret keys in the Connect application.	Native
Database	Specify the following database properties: Database server hostname or IP address. Enter the DNS/hostname or IP address of the Oracle database server. If you enter an IP address, ensure the value conforms to Internet Protocol 4 (IPv4) or 6 (IPv6). Instance name. Enter the name of the data source instance on the database server. Port number. Enter the port number for the data source instance on the database server. Enter a value between 1024 and 65535. The default is 1521. (Optional) Enable prepared statements for performance. When enabled, prepared statements help repeated database statements run more efficiently. Selected by default. JDBC driver version. Click the drop-down list and select the version of JDBC driver used on the Oracle database server system, either 11g or 12c. Do the following as needed: Enable JDBC encryption. Select this option to enable secure encryption for the Oracle JDBC connection. This ensures that communication with the Oracle database is encrypted using, in priority order, the AES-256, AES-192, or AES-128 advanced encryption standard, with AES-256 being the most secure. Pluggable database. When you select 12c, the Pluggable database option displays. Select this option if you are working with a pluggable Oracle database. When selected, you must enter the Root container user ID and Password of the database instance. (Optional) Test user ID and Password. Enter your database user ID and password credentials to allow you access to test the connection to the database server and data source instance.	JDBC

Click Test to verify that your credentials are valid for the properties you entered. (If the Test button is unavailable, ensure you have entered values in all required fields.)
When the test finishes, one of the following displays above the Test button:
- Test was successful. All properties are valid.
- If one or more properties are invalid, a message displays for each error describing why the test failed. Scroll to see the full message text. Your user ID or password may have been entered incorrectly, you do not have access to the specified database, or a directory you entered is invalid.
  Note: You can still save the connection when the test fails
Click Save to add the new connection. (Clicking Cancel closes the window without saving your work.)

Add Target Connection

After you have defined the source from which you want to extract data, you must specify the target database, file system, data warehouse, or Kafka stream to which you want to transfer your data for further processing, archiving (copy pipelines), or replicating captured data changes (replication pipelines).Replication pipelines:

Pipelines for replication projects are configured to copy and/or replicate data changes to a single Kafka or Snowflake target.
You define Apache Kafka properties such as row properties, message format type (CSV or JSON), specify large object types (LOB), and whether to write to a new topic you create on the Kafka server or use an existing topic.
A Kafka target is a producer, publishing messages to a Kafka topic. Topics are categories where the Kafka cluster stores streams of records.
For Snowflake targets, you select a warehouse, database, and schema with the option to create a target table if it does not exist. You also specify the batch criteria for applying records to the target.

Click Add Target. The New Data Connection dialog opens.
For Data connection, enter a unique name for the data connection. Note the following guidelines for connection names:
- If you enter a name that already exists in the repository, you cannot save the connection.
- The name cannot exceed 200 characters.
- For replication data connections, spaces in the name are not supported and hyphens ( - ) are the only special characters supported.
- The name cannot be changed after the connection is saved.
(Optional) For Description, enter a brief description of the data connection.

For Connection type, click the drop-down list and select the type of connection you are adding. If you select a DBMS, specify the associated data Access method.

See available access methods for each connection type:


Connection Type	Access Method	Supported Project Type
Kafka	N/A	Replicate

Specify properties, depending on your connection type:

Kafka


Kafka Properties	Description
Runtime Engines	Do one of the following: Click the Server drop-down list and select the name of the runtime server that will access the database or brokers. Only servers configured for replication projects are available. Click Add Engine to add a new engine. The name of the new e you add will then display in the server drop-down list.
Runtime Engines	Kafka producer configuration: The name of the directory path to the Kafka configuration file on the runtime server. If no file is specified, the configuration is set to the Kafka defaults. (Optional) Select Use Kerberos to use Kerberos authentication protocol for security when connecting to the server(s). Principal. Enter the unique identity (name) of the Kerberos service user. Keytab location. Enter the file location of the Kerberos key table credential certificate on the runtime server.
Brokers	Specify the following properties. You must configure at least one broker. Add and configure additional brokers as needed. Broker hostname or IP address. Enter the DNS/hostname or IP address of the server where Kafka is running. If you enter an IP address, ensure the value conforms to Internet Protocol 4 (IPv4) or 6 (IPv6). Port number. Enter the port number for the Kafka broker. Enter a value between 1024 and 65535. Click Add Broker for each additional broker you want to configure, then enter the broker hostname or IP and port information. To remove a broker from the section, click Remove to the right of the broker.
Options	(Optional) To help improve performance, select Enable asynchronous apply (selected by default). When asynchronous apply is enabled, batches of one or more messages are submitted to Kafka without waiting for the transaction execution results for each changed row to be returned. Results are returned after a batch is sent. When not enabled, messages are sent one at a time and processing does not continue until the acknowledgment is received that the message was applied, often slowing performance. Note: If you plan to use transaction bundling (managed in the Kafka pipeline) when transferring data to the target, to benefit from performance improvement, you must enable asynchronous apply. With transaction bundling, many small transactions are combined into a bundle and sent asynchronously to the Kafka target as a single, larger transaction. When enabling asynchronous apply, in the Cache size limit field, enter a cache size value between 100 and 10000. The default is 1000. Batch size is limited by the cache size. If the cache size limit is greater than the Transaction bundling limit in the associated pipeline target, the cache size limit will override the transaction bundling limit.

Click Test to verify that your credentials are valid for the properties you entered. (If the Test button is unavailable, ensure you have entered values in all required fields.)
When the test finishes, one of the following displays above the Test button:
- Test was successful. All properties are valid.
- If one or more properties are invalid, a message displays for each error describing why the test failed. Scroll to see the full message text. Your user ID or password may have been entered incorrectly, you do not have access to the specified database, or a directory you entered is invalid.
  Note: You can still save the connection when the test fails.
Click Save to add the new connection. (Clicking Cancel closes the window without saving your work.)
Click Next.

View Source Data

You can configure the data flow to include tables from any schemas associated with the current data source connection.

Log reader
For IBM i data connections, the Log reader for journal drop-down list displays. Log readers may already exist. If not, you can autogenerate one. Click Configuration to open the Log Reader Configuration window, and make edits as needed.
Schemas (Libraries)
Displays all schemas available for the selected data connection. Connect to the source datastore and extract the tables for replication.
Tables
The Journal column shows the name of the journal associated with the tables in the schema.

If you are autogenerating a new log reader, the Journal column is filtered to show All Journals. You can filter the list by entering some or all of the name in the search field and clicking the Filter button. Ensure you select tables that use the same journal.

If you selected an existing log reader, the Tables and Journal columns are filtered to show only those items associated with the log reader.
Columns
The Column and Data Type columns open showing the name of each column in the table and each column's data type (for example, DECIMAL, VARCHAR, and so on) and the content of the table with the schema applied.

Mapping

Every table will be mapped and each dataset will have some default values. You can add or update default mapping values and reorder fields to the dataset associated with the selected schema.

To change dataset mapping

Select tables in the schema.
Click Change Mapping to open the Change Dataset Mapping dialog. The number counter displays the number of tables you selected.
Select Use a pattern or Specify or use an existing topic from Topic dropdown list.
Note: If Use a pattern is selected, enter a topic name.
Select a value to display the case of the topic name from the Case dropdown list.
Click Apply.

To change field mapping for a subset of selected tables

Select the number of rows you want.
Click Change Field Mapping to open the Change Field Mapping dialog. The number counter displays the number of tables you selected.

In the Changes tab, select the following options:


Option	Description
Map fields by	Do not change - the user is choosing NOT to change what is already set. Column name - the column mapping will be set by name Column order - the column mapping default will be set by order
Include row metadata fields	Do not change - the user is choosing NOT to change what is already set No row metadata - For selected tables, you are choosing not to include row metadata into column mapping Select row metadata - if you select this option, the associated value will be included in the column mapping of those selected tables

[Optional] Select the Defaults tab to set the field mapping default values.
Note: The Default values only apply to targets that do not exist and will be created.
Click Save to save all of the values for the selected tables.
(Optional). Click Update Defaults to override the values you selected from the Defaults tab for all of the tables.

To set column mapping

Note: If you see the icon

for any of the columns that indicates that the source and target columns do not match and the column does not exist. Once you click Save, the column will be added.

Click Map Columns and the Column Mapping screen displays.
Select the columns you want to map.

For Action, select one of the following:


Option	Value
Action	Copy - includes the selected column in the column mapping Ignore - do not include the selected column in the column mapping

Choose Distribution key(s) by clicking the checkbox for the columns.
Toggle ON Reorder target columns to reorder columns by dragging and dropping columns.
Note: Reorder target columns only displays if the target does not exist / target exists (does not have schema).

To change table mapping for selected rows

Select the number of rows from the table you want.
Check the option Create topic if it does not exist.
Click Change Mapping to open Change Table Mapping dialog.
Select either Use a pattern or specify or an existing topic.
Note: If use a pattern is selected, enter the combination of source columns, schema, and tokens.
(Optional). Select a case to display the list of letter case.
Note: Since Kafka does not support space, special characters such as underscore, and hyphens will be considered.
Click Apply.

Add Filter

You can optionally add one or more filters (conditions and clauses) that refine the data with which you want to work. Filters are useful to copy or replicate a subset of data from your database sources to your specified target database, data warehouse, file systems, or Kafka stream.

Click the Schema drop-down list and select the name of the schema that contains the table you want to filter.
Click the Table drop-down list and select the name of a table in the selected schema you want to filter.
In the Filter expression text field, copy and paste or type the Boolean expression whose results, if true, include the row based on a valid expression for the column, and if false, ignore the row. For example, type Country == 'CAN' to include data rows in which the value in the Country column is Canada.
Click Validate Expression to validate that the expression you entered is valid for the runtime server. The results display in the Validation results field. If validation failed, the results include information about problems with the expression or other reasons for the failed validation.
Scroll to see any details too long to fit the window or drag the corner of the window to resize it.
Note: You can still add the filter if the validation failed.
To save the filter, click Add. The view shows the summary of each filter you added.

Finalize - Replication Pipeline Summary

This summary pages shows the information of the replication pipeline and the next steps to take.

Note: The screen is displayed for a user with Designer and Operator permission.


Stage Configuration Changes	Commit the configuration changes for all replication pipelines to the runtime environments. Note: Click Validate Configuration to validate the selected configuration before the replication pipeline is staged or committed.
Make Configuration Changes Active	Deploy the configuration changes for all replication pipelines in the runtime environments. Replication will continue for any existing replication pipelines while the configuration changes are made active. Download JCL File - Displayed only for projects that have one or more pipelines that are using a Db2 for z/OS type data connection and the Share model is out of sync with the mainframe model.
Start Replication Pipeline	If the pipeline has not been run previously, you can starts the replication pipeline. Note: Only displayed when the user has both Designer and Operator roles. Only operators can start replication.

Download JCL Files

When your replication project includes one or more pipelines that use a Db2 for z/OS JDBC data connection, each time you deploy the project you must first download the latest Job Control Language (JCL) files which are used to run and manage processes on z/OS systems. After you download the files, each file must be copied to and run on the z/OS server associated with the agent referenced in the file name. This ensures the latest capture configuration is active on those servers.

When you download the JCL files, the following occurs:

One JCL file is created for each agent associated with your project. By default, the syntax for a JCL file name is ProjectName_CaptureServerName_CapturePort_AgentName_CommittedVersion.jcl. For example, MyProject_z/os.mycompany.com_2638_MyAgent_5.jcl.
The JCL files are added to a single zip file. By default, the syntax for the zip file name is ProjectName-(CommittedVersion)-jcls.zip. For example, MyProject-10-jcls.zip.
Note: The file name for a project that has not yet been committed will always be appended with a version number of 1.
The zip file is downloaded to a location you specify on your local system.
After you download the zip file, copy the JCL files to the z/OS servers.
Run the JCL files.
Deploy the project.