Query Cache

Query Cache looks up data in a cache based on values in one or more dataflow fields and returns data from matching records in the cache, adding the cache record's data to the record in the dataflow. Looking up data in a cache can improve performance compared to looking up data in a database.

There are two kinds of caches: global caches and local caches.

Global Cache Options

A global cache is system-wide, shared cache that will reside in memory. Choose a global cache if you want the cache to be available to multiple dataflows or when data does not change often or remains relatively static and when storage is not limited. A global cache is static as you can write to it only once. The cache can not be updated once it has been created.

A global cache is created by the Write to Cache stage. Before you can use a global cache you must populate the cache with the data you want to look up. To do this, create a dataflow containing the Write to Cache stage.

Option Name

Description

Cache type

Select the Global cache option.

Cache name

Specifies the cache you want to query.

To create a cache, use the Write to Cache stage.

Cache Fields

This column lists the fields in the cache. You cannot modify these field names.

Stage Fields

This column lists the field names used in the dataflow. If you wish to change a field name, click the field name and enter a new name.

Type

This column lists the data type of each dataflow field.

Include

Check the box in this column to have the query return the value of the cache field. Clear the box if you do not want the query to return the cache field.

Default Error Value

Specifies the value to be displayed in the dataflow field if the query fails. The drop-down list displays valid values corresponding to data type of the queried field. For example, in case of an integer the option displayed is -1.

You can also enter a value to this field. See the table below for a list of valid default error values for various data types.
Data type Valid Default Error Value along with data type (in bracket)
Null

-1

(Integer)

1899-

12-30

12:00:00

(Date Time)

1899-

12-30

(Date)

12:00:00

(Time)

False Empty
Date
Integer
Long
Float

Big

Decimal

Double
String
Time
Date Time
Boolean

Key Field

Specifies the field in the cache that will be used as a lookup key. If the value in the field in the Input Field column matches the value in the key field in the cache, then the query returns data from that record in the cache.

Input Field

Specifies the dataflow field, the value of which will be used as a key. If the value in this field matches the value in the key field in the cache, then the query returns data from that record in the cache.

Local Cache Options

A local cache is a temporary cache which is only used during the execution of Query Cache stage. The Query Cache builds the cache from the database table you choose. It then looks up data in the cache based on key fields and lookup conditions and returns data from matching records in the cache, adding the cache record's data to the record in the dataflow.
A local cache is dynamic as it is created during a job execution of the Query Cache. Once Query Cache completes reading the data, the cache is automatically deleted from the memory. A local cache is recreated every time the Query Cache stage is executed. Choose a local cache if it is only going to be used in one dataflow or if the lookup table changes frequently.
Option name Description
Cache type Specifies the Local cache option.

Connection

Select the database connection you want to use. Your choices vary depending on what connections are defined in the Connection Manager of Spectrum Management Console. If you need to make a new database connection, or modify or delete an existing database connection, click Manage Connections.

If you are adding or modifying a database connection, complete these fields:
Connection name
Enter a name for the connection. The name can be anything you choose.
Database driver
Select the appropriate database type.
Connection options
Specify the host, port, instance, user name, and password to use to connect to the database.
Table/view Specifies the table or view in the database that you want to query.
Database Fields This column lists the fields in the database. You cannot modify these field names.
Stage Fields This column lists the field names used in the dataflow. If you wish to change a field name, click the field name and enter the new name.

Type

This column lists the data type of each dataflow field.

Include

Check the box in this column to have the query return the value of the cache field. Clear the box if you do not want the query to return the cache field.

Default Error Value

Specifies the value to be displayed in the dataflow field if the query fails. The drop-down list displays valid values corresponding to data type of the queried field. For example, in case of an integer the option displayed is -1.

You can also enter a value to this field. See the table below for a list of valid default error values for various data types.
Data type Valid Default Error Value along with data type (in bracket)
Null

-1

(Integer)

1899-

12-30

12:00:00

(Date Time)

1899-

12-30

(Date)

12:00:00

(Time)

False Empty
Date
Integer
Long
Float

Big

Decimal

Double
String
Time
Date Time
Boolean
Key Field Specifies the field in the database that will be used as a look up key. If the value in the field in Input field column matches the value in the Key field in the database, then the query returns the data from that record in the database.
Type Data type of the Key Field value
Operator Select the required operator. The supported operators are:
  • =
  • !=
  • >
  • >=
  • <
  • <=
Is Constant Select this check box if you want the query to return value based on a constant you enter, instead of the Input Field.
Input Field Specifies the dataflow field whose value will be used as a key. If the value in this field matches the value in the Key field in the database, then the query returns data from that record in the database..

Advanced Cache Options

An advanced cache is a temporary cache similar to local cache. It is used during the execution of Query Cache stage. It builds the cache based on the SQL Query which reads the data from the tables mentioned in the query. It then looks up data in the cache based on the lookup keys mentioned in the where clause and returns data from matching records in the cache, adding the cache record’s data to the record in the dataflow..
An advanced cache is dynamic as it is created during a job execution of the Query Cache. Once Query Cache completes reading the data, the cache is automatically deleted from the memory. An advanced cache is recreated every time the Query Cache is executed. Choose an advanced cache option if it is going to read the data from multiple tables and there is some complex query needs to be executed for cache creation.
Option name Description
Cache type Specifies the Advanced cache option.
Connection

Select the database connection you want to use. Your choices vary depending on what connections are defined in the Connection Manager of Management Console. If you need to make a new database connection, or modify or delete an existing database connection, click Manage.

If you are adding or modifying a database connection, complete these fields:

Connection Name
Enter a name for the connection. The name can be anything you choose.
Database Driver
Select the appropriate database type.
Connection Options
Specify the host, port, instance, user name, and password to use to connect to the database.
Query Provides SQL query to read data from the database. The query can read data from multiple tables.
Note: Providing alias is mandatory in the query.
Where This text is used as the where clause to lookup the cache created based on Query. User can specify input field in the Query using $ operator as prefix. For example, _id = ${_inputId}, Where _inputId is the input field and _id is the lookup column in the cache.
Get Fields This populates the grid with the fields which are selected to be cached using SQL query.
Database Fields This column lists the fields fetched in the database. You cannot modify these field names.
Stage Fields This column lists the field names used in the dataflow. If you wish to change a field name, click the field name and enter the new name.
Type This column lists the data type of each dataflow field.

Default Error Value

Specifies the value to be displayed in the dataflow field if the query fails. The drop-down list displays valid values corresponding to data type of the queried field. For example, in case of an integer the option displayed is -1.

You can also enter a value to this field. See the table below for a list of valid default error values for various data types.
Data type Valid Default Error Value along with data type (in bracket)
Null

-1

(Integer)

1899-

12-30

12:00:00

(Date Time)

1899-

12-30

(Date)

12:00:00

(Time)

False Empty
Date
Integer
Long
Float

Big

Decimal

Double
String
Time
Date Time
Boolean

Runtime Tab

The options available in Runtime tab are common for global, local, and advanced caches.

Option Name

Description

Match options

Specifies what to do if there is more than one record in the cache that matches the query.

Return all matches
Return data from all records in the cache that have a matching value in the key field or fields.
Return the first matching record
Return data from only the first record in the cache that has a matching value in the key field or fields.
Return the last matching record
Return data from only the last record in the cache that has a matching value in the key field or fields.
Stage Options
This section lists the dataflow options used in the SQL query of this stage and allows you to provide a default value for all these options. The Name column lists the options while you can enter the default values in the corresponding Value column.
Note: The default value provided here is also displayed in the Map dataflow options to stages section of the Dataflow Options dialog box. The dialogue box also allows you to change the default value. In case of a clash of default values provided for an option through Stage Options, Dataflow Options, and Job Executor the order of precedence is: Value provided through Job Executor > Value defined through the Dataflow Options dialogue box > Value entered through the Stage Options.