Read from Documents
Read from Documents is a source stage that
reads unstructured input data from various file formats and extracts the contents. Possible
sources include legal documents, customer feedback, product reviews, news articles, blogs,
social networks, and so on. Read from Documents also extracts metadata fields such as
author and creation date. Once the data has been extracted it can be used for various types
of processing, including entity extraction and string manipulation among others. The data
can also be used to build search indexes for unstructured text searches.
Note: Each
document is considered one record for this stage.