Text Categorizer

Text categorization, also known as text classification, is the process of assigning custom categories to the unstructured content or plain text, such as email, news articles, and comments on the basis of how much of its content matches the category. Categorization can be done based on subject, author, date, or virtually any classification system defined.

You can create your own categorizer by training a categorizer model with your data and categories. The trainer analyzes the data and stores the information it gains in the training process. It then analyzes the content and determines the category to which the content belongs.

The text categorization feature uses statistical text categorization process. It applies machine learning methods to learn automatic classification rules that are based on human-labeled training documents.

Because you are able to apply the categorization of your choice, you first need to "train" your model to "learn" the categories. After this, you can use that model in the Text Categorizer stage to categorize your unstructured data.

Spectrum™ Technology Platform uses administration utility commands to manage text categorization models. For a description of these commands, see Administration Utility section of Administration Guide.