Read from Documents

Read from Documents is a source stage that reads unstructured input data from various file formats and extracts the contents. Possible sources include legal documents, customer feedback, product reviews, news articles, blogs, social networks, and so on. Read from Documents also extracts metadata fields such as author and creation date. Once the data has been extracted it can be used for various types of processing, including entity extraction and string manipulation among others. The data can also be used to build search indexes for unstructured text searches.

Note: Each document is considered one record for this stage.