Uploading a File from Source

To generate match criteria, you are required to upload a Sample Data. Sample Data must be an actual representation of all your records with numerous variations such as matches, non-matches, duplicates, uniques, and both visually similar or dissimilar value for different fields.

This procedure describes how to upload a file:

  1. On the Select Source page, go to the path where your data file is placed by clicking the icon.
  2. Click the OK button.
    The preview of your data file is displayed in the Data Preview section.
  3. The Character encoding, Field delimiter, Text qualifier, and Line separator fields are pre-populated according to the uploaded data. If required, these can be overridden by the user as described in this table:
    Field Name Description

    Character encoding

    The text file's encoding. Select one of these:

    The text file's encoding. Select one of these:

    CP1252
    This encoding is also known as the Windows-1252 or only Windows character set. It is a superset of ISO-8859-1 and uses the 128-159 code range to display additional characters not included in the ISO-8859-1 character set.
    UTF-8
    Supports all Unicode characters and is backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
    UTF-16
    Supports all Unicode characters but is not backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
    US-ASCII
    A character encoding based on the order of the English alphabet.
    UTF-16BE
    UTF-16 encoding with big-endian byte serialization (most significant byte first).
    UTF-16LE
    UTF-16 encoding with little-endian byte serialization (least significant byte first).
    ISO-8859-1
    An ASCII character encoding typically used for Western European languages. Also known as Latin-1.
    ISO-8859-3
    An ASCII character encoding typically used for Southern European languages. Also known as Latin-3.
    ISO-8859-9
    An ASCII character encoding typically used for Turkish language. Also known as Latin-5.
    CP850
    An ASCII code page used to write Western European languages.
    CP500
    An EBCDIC code page used to write Western European languages.
    Shift_JIS
    A character encoding for the Japanese language.
    MS932
    A Microsoft's extension of Shift_JIS to include NEC special characters, NEC selection of IBM extensions, and IBM extensions.
    CP1047
    An EBCDIC code page with the full Latin-1 character set.
    Field delimiter

    Specifies the character used to separate fields in a delimited file.

    For example, this record uses a pipe (|) as a field delimiter:

    7200 13TH ST|MIAMI|FL|33144

    The characters available as field delimiter are:

    • Comma
    • Semicolon
    • Pipe
    • Tab
    • Space
    • Period

    Text qualifier

    The character used to surround text values in a delimited file.

    For example, this record uses double quotes (") as a text qualifier.

    "7200 13TH ST"|"MIAMI"|"FL"|"33144"

    The characters available to define as text qualifiers are:

    • Single quote (')
    • Double quote (")

    Line separator

    Specifies the character used to separate lines in a sequential or delimited file.

    The line separator settings available are:

    Unix
    A line feed character separates the lines. This is the standard line separator for Unix systems.
    Macintosh
    A carriage return character separates the lines. This is the standard line separator for Macintosh systems.
    Windows
    A carriage return followed by a line feed separates the lines. This is the standard line separator for Windows systems.
  4. Select if the first row should be considered as a header or not through the Yes or No sliding button. The Data Preview changes accordingly.
  5. Click the icon to save your changes and move to the next stage.
  6. Click the icon to cancel your current task.