Using a Custom Groovy Script Hive UDF

To run each Hive UDF job, you can either run these steps individually on your Hive client within a single session, or create an HQL file compiling all the required steps sequentially and run it in one go.

  1. In your Hive client, log in to the required Hive database.
  2. Register the JAR file of Spectrum™ Data & Address Quality for Big Data SDK DIM Module.
    ADD JAR /home/hduser/script/dim.hive.${project.version}.jar;
  3. Create an alias for the Hive UDF of the CustomGroovyScript job.
    Note: String in quotes represents the class names needed for this job to run.
    For example,
    CREATE TEMPORARY FUNCTION customscript as
    'com.pb.bdq.dim.process.hive.script.groovy.CustomGroovyScriptExecutionUDF';
  4. Enable or disable the hive fetch task conversion.
    For Example,
    set hive.fetch.task.conversion=none;
  5. Use hivevar:defaultConfiguration to specify the date, date-time, and time pattern. Assign this configuration to the respective variable.
    set hivevar:defaultConfiguration='{"datePattern":"M/d/yy",
    "dateTimePattern":"M/d/yy h:mm a","timePattern":"h:mm a"}';
    Note: This is an optional configuration.
  6. Specify the header fields of the input table in comma-separated format, and assign to a variable.
    For example,
    set hivevar:header='busniessname,recordid';
  7. Use hivevar:scriptConfigurations to set the groovy script configurations. It includes details, such as groovyScriptFile, inputFields, and outputFields
    For Example,
    set hivevar:scriptConfigurations = 
    '[{"groovyScriptFile":"/home/hduser/script/groovy_hive.txt",
    "inputFields":[{"name":"busniessname","type":"string"},
    {"name":"recordid","type":"integer"}],
    "outputFields":[{"name":"outtan","type":"double"}]},
    {"groovyScriptFile":"/home/hduser/script/groovy2.txt",
    "inputFields":[],
    "outputFields":[{"name":"outtan2","type":"double"}]}]';
  8. To run the job and display the job output on the console, write the query as indicated in this example:
    Note: This query returns output fields as transformed by the groovy script.
    SELECT customscript(${hivevar:scriptConfigurations},"",
    ${hivevar:header}, InputKeyValue, AddressLine1) FROM groovy_tc1;
    To run the job and dump the output in a designated file, write the query as indicated in this example:
    INSERT OVERWRITE LOCAL DIRECTORY '/home/hduser/script/output' 
    row format delimited FIELDS TERMINATED BY ',' lines TERMINATED BY '\n' 
    STORED AS TEXTFILE SELECT * FROM (SELECT customscript
    (${hivevar:scriptConfigurations},
    ${hivevar:defaultConfiguration},${hivevar:header},
    InputKeyValue, AddressLine1) 
    as mygp FROM groovy_tc1 ) record;
    !q;
    Note: Use the alias defined earlier for the UDF.