Connecting to Knox

An Apache Knox Gateway allows you to access a Hadoop service through the Knox security layer. In order for Spectrumâ„¢ Technology Platform to access data in Hadoop via Knox, you must define a connection to Knox using Management Console. Once you do this, you can create flows in Enterprise Designer that can read data from, and write data to, Hadoop via Knox.

  1. Open Management Console.
  2. Go to Resources > Data Sources.
  3. Click the Add button .
  4. In the Name field, enter a name for the connection. The name can be anything you choose.
    Note: Once you save a connection you cannot change the name.
  5. In the Type field, choose Gateway.
  6. In the Gateway Type field, choose Knox.
  7. In the Host field, enter the hostname or IP address of the node in the HDFS cluster running the gateway.
  8. In the Port field, enter the port number for the Knox gateway.
  9. In the User Name field, enter the user name for the Knox gateway.
  10. In the Password field, enter the password to authorize you access to the Knox gateway.
  11. In the Gateway Name field, enter the name of the Knox gateway you wish to access.
  12. In the Cluster Name field, enter the name of the Hadoop cluster to be accessed.
  13. In the Protocol field, choose webhdfs.
  14. In the Service Name field, enter the name of the Hadoop service to be accessed.
  15. To test the connection, click Test.
  16. Click Save.

After you have defined a Knox connection to an HDFS cluster, the connection can be used in Enterprise Designer, in the stages Read from File and Write to File. You can select the HDFS cluster when you click Remote Machine when defining a file in a source or sink stage.