Extracting Preexisting Entities

  1. Create a dataflow that includes a Read from Documents source stage, an Entity Extractor stage, and a sink stage like Write to File or Write to XML.
  2. In the source stage, point to your input file.
  3. In the Entity Extractor stage, select the entities based on the data that you want to extract from the input file. For example, if you want to select names of all persons and addresses in the file, select the Address and Person entities.
    Note: Address and Person are the default entities. To extract data based on any other entity, select the Override system default options with the following values check box, and click Quick Add. The list of the entities gets displayed in the Select entities section.
  4. To get the frequency in the input file of the data related to the specified entities, select the Output entity count check box.
  5. Click OK.
  6. Run the job.