XML Extract

This extract is used to read data from XML files and Web Services in a tabular form. An XmlFile, REST, or SOAP connection is needed.

The extract definition uses XPath language, which is a standard query language for selecting nodes from an XML document. You can find more information on XPath at www.w3schools.com/xml/xpath_intro.asp.

Connection

XML, REST, or SOAP connection. For XML extracts using a connection type REST, the REST connection must use HTTP method GET. To use other methods, an XML load must be used.

Only connections with HTTP modes GET and POST are possible for XML extracts using REST connections. Note: during a data preview of the extract, the POST request is executed, which may provoke undesired changes on the service endpoint.

XPath loop expression

Dropdown list of reasonable XPath expressions that specify the root or anchor elements of the XML document that will be looped through. The number of elements applying to this expression gives the number of output rows.

expressions can also be entered manually.

XPath expression

Dropdown list of expressions that define a column of the extract output or the xpath from the XMLloop expression. For each column, a name and a default value can be defined. All space, multiple space, or null values will be mapped to this default value.

Namespaces

XML documents may contain namespaces. The declaration of these namespaces in the XML extract is optional; it can be omitted if the XML structure has no naming collisions without namespaces. To declare a namespace, a prefix has to be defined for the URI of the namespace. This prefix is then used in the XPath expressions to identify the correct XML nodes.

Prefix	Identifies the correct XML node in the XPath expressions.
URI	The URI of the namespace

Xpath expressions will be limited to the two first array elements. If necessary, further array elements should be entered manually.

/elem/sub-elem[1]/@attr and /elem/sub-elem[2]/@attr are displayed; /elem/sub-elem[3]/@attr has to be entered manually.

XPath index starts at 1, so the first two indexes offered in suggestions are 1 and 2 for XPath.

If caching is activated, the complete output of the extract is temporarily stored during the first call of the extract, using an internal H2 database. Subsequent calls of the extract read directly from the cache without connecting to the underlying source system of the extract. If the extract or the underlying connection contains variables, a separate cache is build for different values of these variables.

See Caching in Extracts and Transforms for more information.

Updated July 3, 2025