Read From XML

The Read from XML stage reads an XML file into a job or subflow. It defines the file's path and data format, including XML schema and data element details.

Simple XML elements are converted to flat fields and passed on to the next stage. Simple XML data consists of records made up of XML elements that contain only data and no child elements. For example, this is a simple XML data file:

<customers>
    <customer>
        <name>Sam</name>
        <gender>M</gender>
        <age>43</age>
        <country>United States</country>
    </customer>
    <customer>
        <name>Jeff</name>
        <gender>M</gender>
        <age>32</age>
        <country>Canada</country>
    </customer>
    <customer>
        <name>Mary</name>
        <gender>F</gender>
        <age>61</age>
        <country>Australia</country>
    </customer>
</customers>

Notice that in this example each record contains simple XML elements such as <name>, <gender>, <age>, and <country>. None of the elements contain child elements.

The Read from XML stage automatically flattens simple data like this because most stages require data to be in a flat format. If you want to preserve the hierarchical structure, use an Aggregator stage after Read from XML to convert the data to hierarchical data.

Complex XML elements remain in hierarchical format and are passed on as a list field. Since many stages require data to be in a flat format, so you may have to flatten complex XML to make the data usable by downstream stages. See Flattening Complex XML Elements for more information.

Note: Read From XML does not support the XML types xs:anyType and xs:anySimpleType.

File Properties Tab

Table 1. File Properties Tab

Option Name

Description

Schema file

Specifies the path to an XSD schema file. Click the ellipses button (...) to locate the file you want. Note that the schema file must be on the server in order for the data file to be validated against the schema. If the schema file is not on the server, validation is disabled.

Alternatively, you can specify an XML file instead of an XSD file. If you specify an XML file the schema will be inferred based on the structure of the XML file. Using an XML file instead of an XSD file has some limitations:

  • The XML file cannot be larger than 1 MB. If the XML file is more than 1 MB in size, try removing some of the data while maintaining the structure of the XML.
  • The data file will not be validated against the inferred schema.
Note: If the Spectrum Technology Platform server is running on Linux, remember that file names and paths on these platforms are case sensitive.

Data file

Specifies the path to the XML data file. Click the ellipses button (...) to locate the file you want.

Note: If the Spectrum Technology Platform server is running on Linux, remember that file names and paths on these platforms are case sensitive.

Preview

Displays a preview of the schema or XML file. When you specify an XSD file, the tree structure reflects the selected XSD. Once you specify both a schema file and a data file, you can click on the schema elements in bold to see a preview of the data that the element contains.

Fields Tab

Table 2. Fields Tab

Option Name

Description

Filter

Filters the list of elements and attributes to make it easier to browse. The filter does not have any impact on which fields are included in the output. It only filters the list of elements and attributes to make it easier to browse.

XPath

The XPath column displays the XPath expression for the element or attribute. It is displayed for information purposes only. For more information about XPath, review this page.

Field

The name that will be used in the dataflow for the element or attribute. To change the field name, double-click and type the field name you want.

Type

The data type to use for the field.

bigdecimal
A numeric data type that supports 38 decimal points of precision. Use this data type for data that will be used in mathematical calculations requiring a high degree of precision, especially those involving financial data. The bigdecimal data type supports more precise calculations than the double data type.
boolean
A logical type with two values: true and false.
date
A data type that contains a month, day, and year. Dates must be in the format yyyy-MM-dd. For example, 2012-01-30.
datetime
A data type that contains a month, day, year, and hours, minutes, and seconds. Datetime must be in the format yyyy-MM-dd'T'HH:mm:ss. For example, 2012-01-30T06:15:30
double
A numeric data type that contains both negative and positive double precision numbers between 2-1074 and (2-2-52)×21023. In E notation, the range of values is -1.79769313486232E+308 to 1.79769313486232E+308.
float
A numeric data type that contains both negative and positive single precision numbers between 2-149 and (2-223)×2127. In E notation, the range of values -3.402823E+38 to 3.402823E+38.
integer
A numeric data type that contains both negative and positive whole numbers between -231 (-2,147,483,648) and 231-1 (2,147,483,647).
list
Strictly speaking, a list is not a data type. However, when a field contains hierarchical data, it is treated as a "list" field. In Spectrum Technology Platform a list is a collection of data consisting of multiple values. For example, a field Names may contain a list of name values. This may be represented in an XML structure as:
<Names>
    <Name>John Smith</Name>
    <Name>Ann Fowler</Name>
</Names>
It is important to note that the Spectrum Technology Platform list data type different from the XML schema list data type in that the XML list data type is a simple data type consisting of multiple values, whereas the Spectrum Technology Platform list data type is similar to an XML complex data type.
long
A numeric data type that contains both negative and positive whole numbers between -263 (-9223372036854775808) and 263-1 (9223372036854775807).
string
A sequence of characters.
time
A data type that contains the time of day. Time must be in the format HH:mm:ss. For example, 21:15:59.

Include

Specifies whether to make this field available in the dataflow or to exclude it.

Example: Simple XML File

In this example, you want to read this file into a dataflow:

<addresses>
    <address>
        <addressline1>One Global View</addressline1>
        <city>Troy</city>
        <state>NY</state>
        <postalcode>12128</postalcode>
    </address>
    <address>
        <addressline1>1825B Kramer Lane</addressline1>
        <city>Austin</city>
        <state>TX</state>
        <postalcode>78758</postalcode>
    </address>
</addresses>

In this example, you could choose to include the <addressline1>, <city>, <state>, and <postalcode>. This would result in one record being created for each <address> element because <address> is the common parent element for <addressline1>, <city>, <state>, and <postalcode>.