Read From DB
- Connectivity to Hive from Spectrum on Windows
- Support and connectivity to Hive version 2.1.1 from Spectrum with high availability
- Support to Read and Write from Hive DB (JDBC) via Model Store connection
Also see Best Practices for connecting to HDFS 3.x and Hive 2.1.1.
To be able to use the Read from DB stage, you need to create connection to the respective database using the Connection Manager of Spectrum Management Console. For details, see Database Connection Manager in Write to DB.
General Tab
Field Name |
Description |
---|---|
Connection |
Select the database connection you want to use. Your choices vary depending on what connections are defined in the Connection Manager of Spectrum Management Console. If you need to make a new database connection, or modify or delete an existing database connection, click Manage Connections. If you are adding or modifying a database connection, complete
these fields:
|
SQL |
Enter the SQL query to specify the records that need
to be read from the data source on running the dataflow. You can
manually type the SQL query in this field. Alternatively, use the
Visual Query Builder to construct the query by clicking
Build SQL. The SQL query can include variables instead of actual column names. Using variables allows you to customize the query at runtime. For more information, see Query Variables. A sample query
for exposing BranchID and BranchType
from public.Branch table in the database can be:
Note: For
Integer type fields values can be entered
without quotes but for String type it should be
in single quotes. |
Build SQL | Create a complex query by selecting multiple
columns, and creating joins and nested queries by clicking
Build SQL. The Visual Query
Builder opens. For more information, see Visual Query Builder. Note: A query created using the Visual Query
Builder is displayed with fully qualified names of columns and
tables in the SQL field. |
Regenerate Fields | To see the schema of the data to be fetched by the
query, click Regenerate Fields. If you edit an existing query, click Regenerate Fields to fetch the modified schema. Note: On
clicking Regenerate Fields, the entity
names in the SQL query are retained and not replaced with their
fully qualified names. |
Preview | To see a sample of the records fetched by the SQL query, click Preview. |
date
datatype as String
values. This is the
behavior of the jTDS driver, which is the default driver used by
Spectrum. To handle all date
datatype values as is, use Microsoft's
JDBC driver.Runtime Tab
Field name | Description |
---|---|
Fetch size |
Select this option to specify the number of records to read from the database table at a time. For example, if the Fetch size value is 100 and total number of records to be read is 1000, then it would take 10 trips to the database to read all the records. Setting an optimum fetch size
can improve performance significantly.
Note: You can calculate an
optimum fetch size for your environment by testing the execution
times between a Read from DB stage and a Write to Null stage. For
more information, see Determining an Optimimum Fetch Size. |
Stage Options | This section lists the dataflow options used in the SQL query of this stage and allows you to
provide a default value for all these options. The
Name column lists the options while you
can enter the default values in the corresponding
Value column.
Note: The default value
provided here is also displayed in the Map dataflow
options to stages section of the
Dataflow Options dialog box. The
dialogue box also allows you to change the default value. In
case of a clash of default values provided for an option through
Stage Options, Dataflow
Options, and Job Executor
the order of precedence is: Value provided through
Job Executor > Value defined through
the Dataflow Options dialogue box > Value
entered through the Stage Options.
|