Information Extraction Module

The Information Extraction Module has capabilities of advanced text processing and information extraction from any natural language input text.

It has pre-trained models that are used to extract entities from an input text, determine relationship between the entities, and assign the category to which the text belongs.

Features Provided

Entity Extraction
Extracts entities from an unstructured data and classifies it into types, such as Location, Date, Organization, ProperNouns, Address and Person. The module ships with some preexisting entities. However, it also has the capability to train models based on your requirement. For details on training a model and defining custom entities, see Custom Entities
Relationship Extraction
Identifies the relationship type binding the entities in any natural language input text.
Text Categorization
Assigns categories, such as email, medical reports, and sports, to your unstructured text based on its content. Before categorizing, you need to train a text categorization model using the Administration Utility. This feature can be used to index patient health-care reports, classify documents by domains and subdomains, and categorize email into SPAM and non-SPAM, among other applications. It also ranks the identified categories, based on the extent to which your text matches with those.