Hadoop Pig Operations

The various Pig operations are as follows:

  1. Sort: Sorts the data in alphabetical order. The sort operation is described in detail in Sorting Input Records.
  2. Filter: Allows you to filter the data according to your requirements. The filter operation is described in more detail in Filtering Input Records.
  3. Aggregate: Allows you to perform statistical operations such as Sum, Count and others, on the data.

    Select the aggregate operations for each field as desired.

    • Sum: Calculates the sum of the values in the field.
    • Average: Calculates the average from all the values in the field.
    • Max: Calculates the maximum value from the values in the field.
    • Min: Calculates the minimum value from the values in the field.
    • Count: Calculates the total number of values in the field.
      Note: If you select the Distinct operation, only the values that are unique are counted.
  4. Distinct: Selecting this option, causes the Aggregate Count operation to count only unique values in the field.
  5. Limit: Enter a value greater than zero, to limit the number of records processed, to this value.