Implementing Pre- or Post-Processing of Scheduled Imports and Exports

Scheduled Imports provides the option to pre-process files before the files are actually imported and Scheduled Exports provides the option to post-process files after they have been exported. In both cases, the actual processing is handled by a Java class that is an extension of the BaseCustomProcessFile class found in the Services.jar file (or in an application-specific JAR file).

When a processing block is being configured within a Scheduled Import or Scheduled Export, details on the function and configurable parameters for the processing block are shown. The blocks are organized under Exports (com.enterworks.services.exports.<class>) or Imports (com.enterworks.services.imports.<class>) based on how they are predominately used, but some modules can be used for either pre-processing or post-processing.

The table below lists available predefined pre-processing and post-processing blocks.

Classpath

Description

com.enterworks.services.exports. CreateUpdateFile

Generates an update file, setting the desired set of columns to specific values for each primary key in the source file. The resulting file can be submitted to an import. This provides a way to update all records that were included in an export (e.g., to indicate the records have been syndicated).

com.enterworks.services.exports. GenerateFixedPositionFile

Creates a fixed position file using one or more export files as a source and one or more mapping files to define the format of the output file. One format file is defined for each format record appearing in the file. If multiple files are defined, records in each file must be related by a common key and sorted on that same key. This allows the file processing to complete the file merge in a single pass. The order of the records is determined by the order the file mappings are defined. If there is a one-to-one mapping of the different records, then the same file can be used as the source for each format. The mapping files must be comma-delimited files with the following columns:

  • Description: A user-description for the field (not used in processing).
  • Type: Datatype for the field:
    • N: Numeric with leading zeros for padding.
    • A: Alphanumeric with trailing spaces for padding.
  • Length: Number of character columns for the field
  • Start: Starting column position. The first column is 1.
  • End: Ending column position. The first column is 1.
  • Value: Value for the field or export file column reference (denoted by double-pipe characters). A single space can be denoted with: [SPACE].

Each mapping file is validated, ensuring that the Start and End positions match the accumulated length of each field.

com.enterworks.services.exports. ProcessTaxonomyTemplateExport

Generates a Taxonomy Template in XLSX format using the exportTemplate for global attributes. Category-specific attributes are added for the designated taxonomy Node and shaded if they are mapped in the designated Taxonomy Node in the designated Publication Template.

com.enterworks.services.exports. ProcessTaxonomyTemplateNodeList

Reads the taxonomy template entries in the file and kicks off a Template Taxonomy export for each one, setting:

  • Parameter1 to the Publication Name
  • Parameter2 to the Taxonomy Node
  • Parameter3 to the name of the Saved Set for each job in the form: 'TaxonomyTemplate_<taxonomyNode>_<datetime>'
  • Parameter4 to the batch number.

Each launched job should use the ProcessTaxonomyExportTemplate post-processing block to generate the corresponding template. The collection of template jobs can be consolidated into a single file using the TaxonomyTemplateExportZip post-processing block.

com.enterworks.services.exports. RemoveHeaderRow Removes the first line of the CSV file.
com.enterworks.services.exports. SplitCsvFile Splits a CSV file into multiple parts, each no larger than the specified maximum number of records. Each part will be named <baseFileName>_<partNumber>.csv. The collection of files will be placed in a ZIP file which is returned.
com.enterworks.services.exports. SplitDeltaExportIntoMultipleParts

Splits a delta export into multiple export jobs, each containing up to the maximum records per job. This can be used in situations where the target system cannot process large files. The delta date and time are specified, along with any optionally included additional conditions. This information is used to generate separate Saved Sets for each batch of the specified size. It then launches Scheduled Export jobs, using the designated Scheduled Export as a template for each job, updating only:

  • Parameter1 to be the name of the Saved Set for the part.
  • Parameter2 to be the part number.
  • Optionally: Parameter3 - Parameter5 to be any additional data.

This allows the target Scheduled Export to have full control over the file naming convention and how the Saved Set is used (such as in 'Saved Set' or 'Root Repository Saved Sets' attributes). This processing can be used in conjunction with any export type and format that can operate on a Saved Set.

com.enterworks.services.exports. TaxonomyTemplateExportZip Packages the TaxonomyTemplateExport files for the same batch into a single .zip file. The ProcessTaxonomyTemplatetNodeList processing block launches separate TaxonomyTemplateExport jobs for each taxonomy node listed in the seed file. Each are identfied as being part of the same batch in Parameter4. This post-processing block collects the files from each job for the same batch and packages them in a single .zip file.
com.enterworks.services.imports. ConcatenateCSVFiles Concatenates a set of files in the designated source directory that match the designated file name pattern, using the header from the designated import template for all files. If the sources files are not identical in structure and the import template contains a superset of attributes, some columns may be padded in each appended file. To prevent the attributes from being cleared, the keepRepoValues import option should be set to true.
com.enterworks.services.imports. CopyImportFile Copies the import file using the designated file name, then processes the original file so that it can be processed by a second import job (for instance, for another repository or different pre-processing).

com.enterworks.services.imports. EncodeFile

Converts the import file from one encoding to another.

com.enterworks.services.imports. EnterworksFileDiff Generates a delta file using the current file and the previous one that was processed. The current and previous files must be CSV format. Requires the EnterworksDiff utility be installed and configured on the Enable server. The generated delta file will include the column il_modification_status, which indicates whether the record is new or has been updated or removed. If there is no previous file, the current file will be processed in full and the il_modification_status column will not be added. If new records need a specific status then the corresponding status attribute should have that default value.
com.enterworks.services.imports. HorizontalToVerticalAttValUomFileFormat

Converts a CSV file that contains multiple attributes as key-value-uoms into (several) vertical files containing a separate line for each triplet. No more than 500k lines will be saved in each target file. The the naming convention used is: ‘vertical_<fileNumber>_<sourceFilename>’.

For example, consider an input file containing the following columns:

ITEM_ID, MFR_PART, STATUS, GROUP_1, ATTRIBUTE_NAME_1, VALUE_1, UOM_1, DIFFERENTIATOR_1, GROUP_2, ATTRIBUTE_NAME_2, VALUE_2, UOM_2, DIFFERENTIATOR_2, ...

This will be converted into the multiple rows, with one row per attribute, with the following headers: ITEM_ID, ATTRIBUTE_NAME, VALUE, UOM. Any global attributes (MFG_PART, STATUS) and extra columns (GROUP_*, DIFFERENTIATOR_*) are ignored.

NOTE: THIS CLASS RETURNS THE SOURCE FILENAME. IT DOES NOT RETURN THE VERTICAL FILES. SEPARATE JOBS MUST BE RUN TO PROCESS THE GENERATED FILES

com.enterworks.services.imports. ImportCustomCodeSets Imports updates to existing one or more Code Sets from a file. If a single Code Set is imported, the expected columns are the same that are required when importing a Code Set through the UI. If multiple Code Sets are imported, the first column must be the Code Set name and all codes for that Code Set must be consecutive. For multiple Code Sets, all options apply to each Code Set and the file type must be CSV. Each Code Set must already be defined in EnterWorks. The import will fail if a Code Set does not exist.
com.enterworks.services.imports. InitiateSaveAndSendForSavedSet

Initiates a 'Save and Send' work item on the designated Workflow and starting point for the designated Saved Set and the specified properties. Several reserved words can be specified for the property values:

  • %savedSetId% - indicates to use the ID for the Saved Set identified by the savedSetName property
  • %userId% - use the ID for the user identified by the userName property
  • %repositoryId% - use the ID for the repository identified by the repositoryName property

com.enterworks.services.imports. PreProcessAddFields

Adds columns and values to the import file before loading.

com.enterworks.services.imports. PreProcessAddHeader Adds a header line to the CSV import file before loading.
com.enterworks.services.imports. PreProcessConcatenateColumns Performs concatenations of data to specific columns within an Import Template. A formula expression of Other columns within the template can be used. It assumes that all columns already exist and it does not create/remove any columns.
com.enterworks.services.imports. PreProcessXLSXAddFields

Pre-processes an .xlsx file. If the file being processed has two header rows, the first one containing the column names and the second one perhaps containing some descriptive information, this pre-processing block can be configured so that the new columns have the designated column names in the first row and nothing in the second row and then the designated data values in subsequent rows.

It adds the designated columns and their values to the file to facilitate batch processing of the file. Generates a new .xlsx file for import into an EnterWorks repository.

com.enterworks.services.imports. ProcessImagePackage

Processes a single file or a zip file containing one or more image files. If the submitted file has the csv extension, it is passed on for import processing by ePIM. If the submitted file has the zip extension, the contents of the zip file are processed. Any valid image files are copied to the designated image directory. If the submitted file is a valid image file, it is copied to the designated image directory.

com.enterworks.services.imports. ProcessMultiRepositoryFile Splits a multi-repository comma-delimited CSV export file into separate import files based on the Import Template definitions referenced by the designated Scheduled Imports. Duplicate consecutive rows and rows containing no values are removed. Jobs for each separate Scheduled Import are launched by this module. The main file contains only those columns in the Import Template assigned to the Scheduled Import launching this pre-processing.
com.enterworks.services.imports. SplitImportFile Splits the import file into two parts. The first part is processed and the second part is staged in the designated target directory.
com.enterworks.services.imports. SplitKeyValueUomTriplexFile

Preprocess an import file containing Dynamic Attributes in key/value pairs or optionally in key/value/UOM triplets. Files may contain explicit attribute names or pairs/triplets of columns that are numbered consecutively for each pair/triplet.

When a file is processed, the contents of each record are split into pre-defined parts as defined in the designated Import Templates and each file is loaded separately. The first part is loaded by this import and subsequent parts are loaded by dependent imports that do not require pre-processing.

If consecutive files contain the same primary key, the values from those lines are combined into a single update (split amongst the defined parts). This allows for vertical files where each row contains the primary key and a single key/value pair or key/value/UOM triplet and multiple rows are for the same repository record.

Except for the last part, any empty rows for a part are filtered out since they won't make any changes to the target record, which reduces overall import processing time. All records are included in the last part because it should be the only one that is validated, but this requires the parts to be daisy-chained together to ensure it is truly the final part that is loaded.

Each part import template can have up to 1022 attributes, including the primary key.

com.enterworks.services.imports. TransferFiles This class will move all files from the source to the target directory passed to the pre-processing module that match the specified file patterns. Allow up to 20 to be specified as separate arguments for the module. Use the asterisk (*) as the wildcard indicator.
com.enterworks.services.imports. TransformFile

Transforms a .csv or .xlsx file into an .xlsx or .csv file containing either:

  • The columns that match the designated import template.
  • Only the valid and transformed columns from the import file.

Optionally, it validates designated columns for required or specific values, and rejects a row if the values are empty or do not match.

com.enterworks.services.imports. UncompressZipFile

Decompresses zip file before processing.

When configuring a Scheduled Import or Export with a pre/post-processing block,

  1. Open the Scheduled Import or Scheduled Export repository.

  2. Open the record for the Import or Export.

  3. Open the Import Details or Export Details tab, and open the Import Preprocess Options or Export PostProcess Options sub-tab.

  4. Set Preprocess File or Postprocess File to Yes.

  5. Enter the full class path for the processing block class and click the calculate button on the Pre-process Class or Post-process Class field.

  6. The define arguments window will open showing a description for what the block does along with what arguments can be set and the current values.

  7. The argument values can be changed and saved by clicking Update Attributes.

Pre/Post-Processing Block

Each processing block class must implement the processFile method. This method is called when there is an import or export file to be processed:

String processFile (String directoryName, String fileName, HashMap args, HashMap inactiveRecords, TreeMap primaryKey, and StringBuffer msgs)

Argument

Data Type

Description

directoryName

String

Fully-qualified path to the directory containing the file to be processed. The file to be returned must also be placed in this same directory.

fileName

String

Name of the file to be processed.

args

HashMap

Map of any pre/post processing arguments defined in the Scheduled Import/Export.

inactiveRecords

HashMap

Map containing the primary keys of any records in the repository having a Status of Inactive. This is only set for imports and only if the Inactive Records flag is set to Reactivate.

primaryKey

TreeMap

The primary key for the repository.

msgs

StringBuffer

Medium for returning error messages to be displayed with the job.

The method must return either the name of the processed file or null if the processing block failed.

If the processing block class has configurable arguments, there are three methods that must be implemented:

  • String getDescription() – returns a detailed description of what the processing block does.

  • void defineArguments() – builds the list of arguments that can be configured. Each argument is defined by calling the method:

    • void addArgument(String arg, String description) – adds an argument to the list of arguments/properties that can be set for the class in the Scheduled Import or Scheduled Export record:

Argument

Data Type

Description

arg

String

Name of argument. This name will be used to retrieve the actual value for the argument. Each defined argument must be uniquely named

description

String

Detailed description for the argument. It should include possible values (if finite set) or range of values, default if left blank, etc.

The BaseCustomProcessFile class has a set of methods that help minimize the amount of coding required in a processing block class:

  • void clearBadDate(HashMap parsedLine, String column)

    This clears the date value if it is not 10 characters (in mm/dd/yyyy format) or is an invalid date (e.g., 00/00/0000).

  • void closeInput(BufferedReader br)

    This closes the opened CSV or TXT file.

  • void closeOutput(PrintWriter output)

    This closes the opened CSV or TXT file.

  • void convertToBoolean(HashMap parsedLine, String column)

    This converts the values “Y” or “Yes” to 1 and everything else to 0 for the designated column.

  • boolean doesFileExist(String directoryName, String fileName)

    This returns true if the specified file in the specified directory exists.

  • void dropLeadingZeros(HashMap parsedLine, String column)

    This removes leading zeros from each value containing them.

  • ArrayList extractFiles(String directoryName, String fileName, String fileEncoding, StringBuffer msgs)

    This extracts the contents of a zip file and returns a list of unzipped files.

  • void freeQuery(DBQuery dbQuery)

    This frees the query connection that was previously obtained with getQuery()

  • String[] getHeaderForImportTemplate(String importTemplateName)

    This returns a list of columns based on the mappings in the designated import template.

  • String getHeaderForImportTemplateAsCsvString(String importTemplateName, String delimiter)

    This returns a delimited list of columns based on the mappings in the designated import template.

  • String getJobNumber()

    This retrieves the identification number of the job being processed.

  • HashMap<String, String> getMapForHeader(String[] header)

    This returns a map of columns based on the list of columns for the header.

  • DBQuery getQuery()

    This retrieves a query connection that can subsequently be used to query the EPIM database.

  • PrintWriter getReport()

    This retrieves the PrintWriter object that is configured to generate the report for the job. Any calls on this object will update that report.

  • void insertDecimal(HashMap parsedLine, String column, int decimalPosition)

    This inserts a decimal point character in a value at the designated number of digits from the right.

  • void logDebug(String message)

    This generates a message in the log file if debug logging is enabled (debugEnabled=true) in the Enterworks.properties file

  • void logReport(String message)

    This adds a line to the import or export report file.

  • void logError(String message)

    This adds a line to the EPX BIC log file.

  • void logError(StringBuffer msgs, String message)

    This adds a line to the EPX BIC log file and to the Errors attribute for the Scheduled Import Job or Scheduled Export Job record.

  • BufferedReader newInput(String directoryName, String fileName, String charSet)

    This opens a CSV or TXT file for reading.

  • PrintWriter newOutput(String directoryName, String fileName, String encoding)

    Opens a CSV or TXT file for writing.

  • void outputHeaderLine(PrintWriter output, String[] columns, String delimiter)

    This outputs the header line with each column separated using the specified delimiter.

  • void outputHeaderLine(PrintWriter output, String[] columns, char delimiter, char textQualifier)

    This outputs the header line with each column separated using the specified delimiter and text qualifier (for when column names include the delimiter or text qualifier character).

  • void outputParsedLine(PrintWriter output, HashMap parsedLine, String[] columns, char delimiter, char textQualifier)

    This outputs a line using the parsed values and the designated delimiter and text qualifier.

  • void outputParsedLine(PrintWriter output, HashMap parsedLine, String[] columns, String delimiter)

    This outputs a line using the parsed values and the designated delimiter.

  • String[] parseHeader(String headerLine, String delimiter)

    This parses the header line using the designated delimiter. If the delimiter is a comma, then special processing is done for commas and quotes embedded in the header names.

  • String[] parseHeader(String headerLine, char delimiter, char textQualifier)

    This parses the header line using the designated delimiter. Uses the designated textQualifier to handle values that contain the delimiter or the text qualifier. Assumes the embedded text qualifier is escaped with the same character. For example, if the delimiter is a comma and the text qualifier is a double quote, then the value: ,”3”” x 4””, Rough Cut” would be stored as: 3” x 4”, Rough Cut.

  • HashMap parseLine(String line, String[] header, String delimiter)

    This parses a line from the file using the defined header and delimiter. Returns a HashMap where each key matches a column name and its value is the corresponding value from the file.

  • HashMap parseLine(String line, String[] header, String delimiter, boolean trimWhiteSpace)

    This parses a line from the file using the defined header and delimiter. Returns a HashMap where each key matches a column name and its value is the corresponding value from the file. Trims white space from values if trimWhiteSpace is true.

  • HashMap parseMultiLine(BufferedReader br, String[] header, String delimiter)

    This parses a multi-line (where one or more values contains linefeed/carriage return characters and is properly quoted using the header for the map returned. Returns null if end of file or empty line is encountered.

  • HashMap parseMultiLine(BufferedReader br, String[] header, char delimiter, char textQualifier)

    This parses a multi-line (where one or more values contains linefeed/carriage return characters and is properly escaped with the designated textQualifier using the header for the map returned. Returns null if end of file or empty line is encountered.

  • void reactivateRecord(HashMap parsedLine, HashMap inactiveRecords, TreeMap primaryKey, String reactivateColumnName)

    This reactivates a record that was previously inactivated but is now in the import file.

  • void removeCharacter(HashMap parsedLine, String character)

    This removes the designated character from each parsed value.

  • void removeSpaces(HashMap parsedLine)

    Removes leading and trailing white space from each parsed value.

  • String[] simpleParseHeader(String headerLine, String delimiter)

    This parses the header line using the designated delimiter. Delimiter is passed to the String.split() method.

  • void updateExport(HashMap update)

    This updates the specified attributes in the export job with the specified values.

  • void updateExportStatus(String recordsProcessed, String recordsWithErrors, String status, String exportErrors)

    This updates the Scheduled Export Job record with the specified details. This call should be made if the post-processing is going to take a considerable amount of time to complete. The call should be made no more than once every several minutes.

  • void updateExportStatus(String recordsProcessed, String recordsWithErrors, String status, String downloadLink, String exportErrors)

    This updates the Scheduled Export Job record with the specified details, including a URL for downloading the processed file. This call should be made after processing of the file has completed.

  • void updateImportJob(HashMap update)

    This updates the specified attributes in the import job record with the specified values.

  • void updateImportStatus(String recordsProcessed, String recordsUpdated, String recordsCreated, String recordsDeleted, String recordsWithErrors, String status, String importErrors)

    This updates the Scheduled Import Job record with the specified details. This call should be made if the pre-processing is going to take a considerable amount of time to complete. The call should be made no more than once every several minutes.