Entity Extraction

Entity Extraction has capabilities of advanced text processing and information extraction from any natural language input text.

It has pre-trained models that are used to extract entities from an input text, determine relationship between the entities, and assign the category to which the text belongs.

Features Provided

Entity Extraction

This module extracts entities from an unstructured data and classifies it into types, such as Location, Date, Organization, ProperNouns, Address and Person.

The module ships with some preexisting entities. However, it also has the capability to train models based on your requirement. For details on training a model and defining custom entities, see Custom Entities in the User Guide.

Relationship Extraction

Relationship Extraction is the process of analysing the unstructured text to identify the relationship between the various extracted entities.

The entity types supported for relationship extraction are:
  • Person
  • Organisation
  • Location
The supported relationship types are:
  • AffiliatedWith
  • LivesIn
  • OrgBasedIn
  • LocatedIn
  • Negative

Text Categorization

Text categorization, also known as text classification, is the process of assigning custom categories to the unstructured content or plain text, such as email, news articles, and comments on the basis of how much of its content matches the category. Categorization can be done based on subject, author, date, or virtually any classification system defined.

You can create your own categorizer by training a categorizer model with your data and categories. The trainer analyzes the data and stores the information it gains in the training process. It then analyzes the content and determines the category to which the content belongs.

The text categorization feature uses statistical text categorization process. It applies machine learning methods to learn automatic classification rules that are based on human-labeled training documents.

Because you are able to apply the categorization of your choice, you first need to "train" your model to "learn" the categories. After this, you can use that model in the Text Categorizer stage to categorize your unstructured data.

Spectrum Technology Platform uses administration utility commands to manage text categorization models. For a description of these commands, see Administration Utility section of Administration Guide.