Read From Hive File

The Read from Hive File stage reads data from the selected file, which can be in any of the following formats:
  • ORC
  • RC
  • Parquet
  • Avro

File Properties tab

Fields Description
Server Indicates the file you select in the File name field is located on the Hadoop system. You need to create a connection to the Hadoop file server in the Management Console before using it in the stage. If you select a file on the Hadoop system, the server name will be the name you specify in the Management Console while creating a file server.
File name Specifies the path to the file. Click the ellipses button (...) to browse to the file you want.
Note: The schema of an input file is imported as soon as you browse to the correct location and select the file. This imported schema cannot be edited.

You may, however, rename the columns of the schema as required.

The first 50 records of the file are fetched in the Preview grid on selecting the file.
File type Select the type of the file being read:
  • ORC
  • RC
  • Parquet
  • Avro
Note: In case of RC files, to generate the Preview, define the schema in the Fields tab and then click Preview in the File Properties tab.

Fields tab

The Fields tab defines the names, datatypes, positions of the fields as present in the input file, as well as the user-given names for the fields. For more information, see Defining Fields for Reading from Hive File.