Spectrum Dataflow Designer's Guide

Content
Search Results
Loading, please wait ...

Loading

  • Welcome
  • Getting Started
    • Installing the Client Tools
    • Starting Spectrum Enterprise Designer
    • A First Look at Spectrum Enterprise Designer
    • My First Dataflow (Job)
    • My First Dataflow (Service)
    • Dataflow Templates
      • Creating a Dataflow Using a Template
    • Importing and Exporting Dataflows
  • Designing a Flow
    • Types of Flows
    • Flow Input
      • Defining Job Input
        • Managing malformed input records
      • Defining Service Input
        • Defining Input Fields for a Service or Subflow
        • Defining A Web Service Data Type
    • Fields
      • Flat and Hierarchical Data
        • Converting flat data to a list
      • Data Types
        • Automatic Data Type Conversion
          • Setting Data Type Conversion Options for a Flow
          • Date and time patterns
          • Number Patterns
        • *Changing a field's data type
      • Changing a field name
      • Reserved Field Names
    • Control Stages
      • Aggregator
      • Broadcaster
      • Conditional Router
        • Configuring a Conditional Router
          • Using the Expression Builder
          • Writing a Custom Expression
      • Group Statistics
        • Operations
        • Output Columns
        • Pivot Tables
          • Creating a Pivot Table
      • Math
        • Using the Calculator
        • Using Functions and Constants
        • Using Conditional Statements
        • Using the Expressions Console
        • Using the Fields Control
        • Using the Preview Control
      • Record Combiner
      • Record Joiner
      • Sorter
        • Sorting Records with Sorter
      • Splitter
      • Stream Combiner
      • Transformer stage transform types
        • Changing the Order of Transforms
        • Creating a Custom Transform
        • Using a Mask Transform
      • Unique ID Generator
        • Defining a Unique ID
          • Unique ID Definition Methods
        • Using Algorithms to Augment a Unique ID
        • Defining a Non-Unique ID
    • Module Stages
      • Advanced Matching stages
        • Best of Breed
          • Options
            • Defining Template Record Rules
            • Defining Best of Breed Rules and Actions
          • Output
        • Candidate Finder
          • Database Options
            • Defining the SQL Query
            • Mapping Database Columns to Stage Fields
            • Configuring the Connection Name at Runtime
          • Search Index Options
            • Simple Search Index Options
            • Advanced Search Index Options
            • Configuring Options at Runtime
          • Output
        • Duplicate Synchronization
          • Options
        • Filter
          • Options
        • Interflow Match
          • Options
          • Output
        • Intraflow Match
          • Options
          • Default Matching Method
          • Sliding Window Matching Method
          • Output
        • Match Key Generator
          • Input
          • Options
          • Output
        • Private Match
          • Input
          • Options
          • Output
        • Transactional Match
          • Options
          • Output
        • Write to Search Index
          • Options
          • Output
          • Search Index Management
          • Standard and Keyword Analyzer
      • Analytics Scoring stages
        • Binning Lookup
          • Introduction to Binning Lookup
          • Defining Binning Properties
          • Binning Output
        • Java Model Scoring
          • Introduction to Model Scoring
          • Defining Model Properties
          • Configuring Model Output
        • PMML Model Scoring
          • Introduction to PMML Model Scoring
          • Deploying a Model
          • Reconfigure PMML Model Scoring Settings
          • Output
          • Supported Model Formats
            • QMML
              • Miner Model
            • PMML
              • Association Rule
              • Clustering
              • Classification Tree
              • Regression Tree
              • Naive Bayes
              • Regression
              • Regression Classifier
              • Scorecard
        • Read from Miner Dataset
          • Introduction to the Read from Miner Dataset
          • Reading from a Miner Dataset
          • Fields Tab
          • Output
        • Write to Miner Dataset
          • Introduction to the Write to Miner Dataset
          • Writing to a Miner Dataset
          • Applying Metadata
          • Fields Tab
          • Output
      • Context Graph stages
        • Delete from Model
          • Input
          • Options
            • The Options Tab
            • The Runtime Tab
          • Output
        • Import to Model
          • Input
          • Options
            • The Entities Tab
            • The Relationships Tab
            • The Options Tab
          • Output
        • Merge Entities
          • Input
          • Options
            • The Options Tab
            • The Runtime Tab
          • Output
        • Read from Model
          • The Query Tab
          • The Fields Tab
          • Output
        • Query Model
          • The Query Tab
          • The Fields Tab
          • Input/Output Requirements
        • Split Entity
          • Input
          • Options
            • The Options Tab
            • The Runtime Tab
          • Output
        • Write to Model
          • Input
          • The Entities Tab
          • The Relationships Tab
          • The Options Tab
            • Setting Exclusive Lock Timeout Duration
          • Output
          • Sample Model to Context Graph Dataflow
            • Flat Sample
            • XML Sample
      • Data Normalization stages
        • Advanced Transformer
          • Input
          • Options
            • Configuring Options
            • Configuring Options at Runtime
          • Output
        • Open Parser
          • Input
          • Options
          • Output
        • Table Lookup
          • Input
          • Options
            • Configuring Options
            • Configuring Options at Runtime
          • Output
        • Transliterator
          • Transliteration Concepts
          • Input
          • Options
          • Output
      • Data Stewardship Stages
        • Introduction
        • Exception Monitor
          • Input
          • Output
          • Reference
            • Conditions tab
            • Configuration tab
            • Add/Modify Condition dialog box
            • Add/Modify Expression dialog box
          • How to
            • Add the Exception Monitor stage to a workflow
            • Using Custom Expressions in Exception Monitor
        • Read Exceptions
          • Input
          • Output
          • Options
        • Write Exceptions
          • Input
          • Output
          • Options
      • Enterprise Data Integration Stages
        • Call Stored Procedure
        • DB Change Data Reader
          • Adding a CDC Resource
          • Editing a CDC Resource
          • Deleting a CDC Resource
          • Selecting Change Data Reader Options
        • DB Loader
          • Oracle Loader
          • DB2 Loader
          • PostgreSQL Loader
          • Teradata Loader
        • Field Parser
        • Field Combiner
        • Field Selector
        • Generate Time Dimension
          • Options
            • Creating a Calendar
          • Output
        • Query Cache
        • Query DB
          • Parameterizing Query DB at Runtime
        • Query NoSQL DB
          • Defining Fields - Query NoSQL DB
          • Configuring Dataflow Options - Query NoSQL DB
        • Read From DB
          • Visual Query Builder
            • Adding Objects to a Query
            • Setting Object Aliases
            • Joining Tables
            • Selecting Output Fields
            • Sorting a Dataset
            • Defining Criteria
            • Grouping Output Fields
            • Defining SQL Query Properties
          • Query Variables
            • Inserting a Query Variable
            • Configuring a Query Variable as a Dataflow Option
            • Configuring a Query Variable for Job Executor
        • Read From File
          • Defining Fields In a Delimited Input File
          • Defining Fields In a Line Sequential or Fixed Width File
          • Sorting Input Records
          • The File Definition Settings File
          • Configuring Dataflow Options
        • Read from Hadoop Sequence File
          • Defining Fields In an Input Sequence File
          • Sorting Input Records
          • Filtering Input Records
        • Read From Hive File
          • Defining Fields for Reading from Hive File
        • Read from HL7 File
          • Flattening HL7 Data
          • Adding a Custom HL7 Message
        • Read from NoSQL DB
          • Defining Fields in a NoSQL Database
          • NoSQL DB Dataflow Options
        • Read from SAP
          • Connecting to SAP
          • Reading Data from a Single SAP Table
          • Reading Data from Multiple SAP Tables
          • Filtering Records in Read from SAP
        • Read from Spreadsheet
        • Read from Variable Format File
          • Defining Fields in Delimited Variable Format Files
          • Defining Fields in a Line Sequential or Fixed Width Variable Format File
          • Flattening Variable Format Data
        • Read From XML
          • Flattening Complex XML Elements
        • SQL Command
          • Specifying SQL Command at Runtime
          • Running A Job from the Command Line
          • Executing SQL Commands Before or After a Dataflow
        • Transposer
        • Unique ID Generator
          • Defining a Unique ID
            • Unique ID Definition Methods
          • Using Algorithms to Augment a Unique ID
          • Defining a Non-Unique ID
        • Write to Cache
          • Clearing a Global Cache
        • Write to DB
          • Database Connection Manager
          • Configuring Error Handling in Write to DB
        • Write to File
          • Defining Fields In a Delimited Output File
          • Defining Fields In a Line Sequential or Fixed Width File
          • Sorting Output Records
          • The File Definition Settings File
          • Configuring Dataflow Options
        • Write to Hadoop Sequence File
          • Defining Fields In an Output Sequence File
        • Write to Hive File
          • Defining Fields for Writing to Hive File
        • Write to NoSQL DB
          • Defining Fields in a NoSQL Database
          • NoSQL DB Dataflow Options
        • Write to Spreadsheet
          • Defining fields in an Output file
        • Write to Variable Format File
          • Writing Flat Data to a Variable Format File
          • Tag Names in Variable Format Files
        • Write to XML
          • Using Namespaces in an XML Output File
          • Creating Complex XML from Flat Data
        • Date and Number Patterns
          • Date and time patterns
          • Number Patterns
      • Global Addressing Management Stages
        • Spectrum Global Address Validation
          • Supported Countries
          • Using Spectrum Global Address Validation
          • Using Spectrum Global Address Validation As a Service
          • Using Spectrum Global Address Validation As a Stage
          • Options
            • Global Addressing Options
              • Matching Options
              • Custom Match Options
            • US Addressing Options
              • Additional Processing
              • CASS Mailer Information
              • Multiple Address Line Options
              • Log Level Options
            • Output Options
          • Input
          • Output
          • Reports
            • Reports
            • Match Analysis by Country
            • Address Matching Summary Report
            • USPS Form 3553 (CASS Summary Report)
        • Spectrum Global Type Ahead
          • Global Type Ahead Features
          • Supported Countries
          • Using Global Type Ahead
          • Using Global Type Ahead As a Service
          • Using Global Type Ahead As a Stage
          • Options
          • Input
          • Output
          • Spectrum Global Type Ahead Sample Web Application
          • Global Type Ahead Java Script Component
            • Requirements
            • Integrating Global Type Ahead Into Your Web Application
            • Installing the Global Type Ahead Java Script Component
            • Configuring Spectrum Technology Platform to Use the Global Type Ahead Java Script Component
              • Enabling CORS
              • Authentication
            • Configuring the Global Type Ahead Java Script Component
              • Customizing the Global Type Ahead Java Script Component
              • Configuring Global Type Ahead Java Script Component Processing
            • Alternative Global Type Ahead Java Script Component Processing
            • Using the Global Type Ahead Java Script Component
            • Technical Notes
        • Spectrum Global Address Parser
          • Features of Global Address Parser
          • Standard Fields
          • Guidelines to Improve Prediction Accuracy
          • Accessing Global Address Parser
          • Using Global Address Parser As a Stage
          • Using Global Address Parser As a Service
          • Parsed Address Output
        • US Database Lookup
          • Supported Countries
          • Using US Database Lookup
            • Using Last Line Lookup for City, State, and ZIP Code
              • Using City and State for Last Line Lookup
              • Using ZIP Code for Last Line Lookup
              • Using City/State and ZIP Code for Last Line Lookup
            • Using Last Line Lookup for Street Name
              • Using City and State for Street Name Lookup
              • Using ZIP Code for Street Name Lookup
            • Using Last Line Lookup for House Number
              • Using City and State for House Number Lookup
              • Using ZIP Code for House Number Lookup
      • Information Extraction stages
        • Read from Documents
          • Input
          • Options
          • Output
        • Entity Extractor
          • Input
          • Options
          • Output
        • Relationship Extractor
          • Input
          • Options
          • Output
        • Text Categorizer
          • Input
          • Options
          • Output
      • Machine Learning Stages
        • Binning
          • Introduction
          • Defining Binning Properties
          • Configuring Basic Options
          • Binning Output
        • K-Means Clustering
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Port
            • Model Metrics Port
        • Linear Regression
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Ports
            • Model Score Port
            • Model Metrics Port
        • Logistic Regression
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Ports
            • Model Score Port
            • Model Metrics Port
        • Principal Component Analysis
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Port
            • Model Metrics Port
        • Random Forest Classification
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Ports
            • Model Score Port
            • Model Metrics Port
        • Random Forest Regression
          • Introduction
          • Defining Model Properties
          • Configuring Basic Options
          • Configuring Advanced Options
          • Model Output
          • Output Ports
            • Model Score Port
            • Model Metrics Port
      • Universal Addressing Stages
        • Auto Complete Loqate
          • Input
          • Options
          • Output
        • Get Candidate Addresses
          • Input
          • Options
          • Output
        • Get Candidate Addresses Loqate
          • Input
          • Options
          • Output
        • Get City State Province
          • Input
          • Options
          • Output
        • Get City State Province Loqate
          • Input
          • Options
          • Output
        • Get Postal Codes
          • Input
          • Options
          • Output
        • Get Postal Codes Loqate
          • Input
          • Options
          • Output
        • Validate Address
          • Input
            • Address Line Processing for U.S. Addresses
          • Options
            • Output Data Options
              • Obtaining Congressional Districts
              • Obtaining County Names
              • Obtaining FIPS County Numbers
              • Obtaining Carrier Route Codes
              • Creating Delivery Point Barcodes
            • Default Options
              • About Dual Address Logic
              • Returning Multiple Matches
            • U.S. Address Options
              • CASS Certified Processing
            • Canadian Address Options
              • SERP Processing
              • Obtaining SERP Return Codes
            • International Address Options
          • Output
            • Standard Address Output
            • Parsed Address Elements Output
            • Parsed Input
            • Postal Data Output
            • Result Indicators
              • Record-Level Result Indicators
              • Field-Level Result Indicators
            • Output from Options
            • Additional Input Data
              • Care of Data
              • Extraneous Data on Its Own Address Line
              • Extraneous Data Within an Address Line
              • Dual Addresses
          • Reports
            • USPS CASS 3553 Report
            • USPS CASS Detail Report
            • Validate Address Summary Report
        • Validate Address Global
          • Input
            • Address Guidelines for Japan
          • Options
            • Input Options
            • Output Options
            • Process Options
          • Output
            • Address Data
            • Original Input Data
            • Result Codes
          • Reports
            • Validate Address Global Summary Report
            • Validate Address Global Detail Report
        • Validate Address Loqate
          • Input
          • Options
            • Returning Multiple Matches
            • Match Score Threshold Options
          • Output
            • Standard Address Output
            • Parsed Address Elements Output
            • Parsed Input
            • Geocode Output
            • Result Indicators
              • Record-Level Result Indicators
              • Field-Level Result Indicators
            • The AVC Code
            • AMAS Output
      • Universal Name Stages
        • Name Parser (DEPRECATED)
          • Input
          • Options
            • Modifying Name Parser User-Defined Tables
          • Output
        • Name Variant Finder
          • Input
          • Options
          • Output
        • Open Name Parser
          • Input
          • Options
            • Parsing Options
            • Cultures Options
            • Advanced Options
            • Configuring Options at Runtime
          • Output
          • Open Name Parser Summary Report
    • Flow Output
      • Defining Service Output
        • Defining A Web Service Data Type
      • Running an External Program
      • Terminating a Job Based on a Condition
      • Discarding records - Write to Null
    • Embedded flows
      • Grouping stages into an embedded flow
      • Editing an embedded flow
      • Using iteration with an embedded flow
      • Ungrouping an embedded flow
      • Converting an embedded flow to a subflow
    • Reports
      • Adding a standard report to a job
      • Setting report options for a job
      • Viewing reports
      • Using custom reports
    • Performance Considerations
      • Design guidelines for optimal performance
      • Stage Runtime Performance Options
        • Database Pool Size and Runtime Instances
        • Distributed Processing
          • Designing a flow for distributed processing
      • Optimizing Stages
        • Optimizing Matching
        • Optimizing Candidate Finder
        • Optimizing Transforms
        • Optimizing Write to DB
        • Optimizing Address Validation
        • Optimizing Geocoding
    • Flow Versions
      • Saving a Flow Version
      • Viewing a Flow Version
      • Editing a Flow Version
      • Editing Version Properties
      • Exposing a Version
  • Inspecting and Testing
    • Checking a Flow for Errors
    • Inspecting a flow
    • Testing a service with Spectrum Management Console
  • Running a Flow
    • Running a Job or Process Flow
      • Running a Flow in Spectrum Enterprise Designer
      • Running A Job from the Command Line
        • Overriding Job File Locations
        • Overriding the File Format at the Command Line
        • Using a Job Property File
      • Running a Process Flow from the Command Line
        • Using a Process Flow Property File
      • Scheduling a Flow
      • Triggering a Flow with a Control File
      • Viewing Flow Status and History
        • Downloading Flow History
      • Setting the Malformed Records Default
      • Setting Report Defaults
    • Exposing a Service
      • Exposing a Service as a Web Service
      • Exposing a Service for API Access
    • Runtime Options
      • Adding Flow Runtime Options
      • Specifying Default Service Options
      • Deleting flow Runtime Options
    • Configuring Email Notification for a Flow
  • Combining Flows into a Process Flow
    • Introduction to Process Flows
    • Designing Process Flows
      • Creating a Process Flow
      • Using a Variable to Reference a File
      • Adding Conditional Logic to a Process Flow
      • Deleting a Process Flow
      • Activities
        • Job
          • Overriding Input and Output Files
        • Clear Cache
        • Execute SQL
        • Load to Hive
          • Creating a Hive Connection
        • Run Program
          • Specifying Input and Output Files
          • Using a Control File with an External Program
        • Success
  • Creating Reusable Flow Components
    • Introduction to Subflows
    • Using a Subflow as a Source
    • Using a Subflow in the Middle of a Flow
    • Using a Subflow as a Sink
    • Modifying a Subflow
    • Deleting a Subflow
    • Exposing and Unexposing a Subflow
    • Converting a Stage to a Subflow
  • Sample Flows
    • Introduction
    • Integration between SugarCRM OnPremises and Microsoft Dynamics 365 Online
    • Integration between Salesforce and Oracle Eloqua
  • About Spectrum Technology Platform
    • What Is Spectrum Technology Platform?
    • Enterprise Data Management Architecture
    • Spectrum Technology Platform Architecture
    • Modules and Components