An Apache Knox Gateway allows you to access a Hadoop service through the Knox
security layer.
With this connection, you can create flows in the Enterprise Designer using stages in
the Enterprise Big Data module to read data from and write data to Hadoop via
Knox.
-
Access the Data Sources page using one of these
modules:
- Management Console:
- Access Management Console using the URL:
http://server:port/managementconsole,
where server is the server name or IP address of
your Spectrum™ Technology Platform server and
port is the HTTP port used by Spectrum™ Technology Platform.
Note: By default, the HTTP port is
8080.
- Go to .
- Metadata Insights:
- Access Metadata Insights using the URL:
http://server:port/metadata-insights,
where server is the server name or IP address of
your Spectrum™ Technology Platform server and
port is the HTTP port used by Spectrum™ Technology Platform.
Note: By default, the HTTP port is
8080.
- Go to Data Sources.
-
Click the Add button .
-
In the Name field, enter a name for the connection. The
name can be anything you choose.
Note: Once you save a connection you cannot change the name.
-
In the Type field, choose
Gateway.
-
In the Gateway Type field, choose
Knox.
-
In the Host field, enter the hostname or IP address of
the node in the HDFS cluster running the gateway.
-
In the Port field, enter the port number for the Knox
gateway.
-
In the User Name field, enter the user name for the Knox
gateway.
-
In the Password field, enter the password to authorize
you access to the Knox gateway.
-
In the Gateway Name field, enter the name of the Knox
gateway you wish to access.
-
In the Cluster Name field, enter the name of the Hadoop
cluster to be accessed.
-
In the Protocol field, choose
webhdfs.
-
In the Service Name field, enter the name of the Hadoop
service to be accessed.
-
To test the connection, click Test.
-
Click Save.
After you have defined a Knox connection to an HDFS cluster, the connection can be
used in Enterprise Designer, in the stages Read from File and
Write to File. You can select the HDFS cluster when you click
Remote Machine when defining a file in a source or sink
stage.