XML Extract


This extract is used to read data from XML files and web services in a tabular form. An XmlFile or SOAP connection is needed.

For the extract definition, XML Path Language (XPath) is used. XPath is a standard query language for selecting nodes from an XML document. You can find more information on XPath at  www.w3schools.com/xml/xpath_intro.asp.

This is an absolute XPath expression that specifies the anchor elements of the XML document that will be looped through. The number of elements applying to this expression gives the number of output rows.

You can then define as many XPath expressions as you want, each of which defines one column of the extract output. A name and a default value can be defined for each column. All space, multiple space, or null values will be mapped to this default value.

XML documents may contain namespaces. With ETL version 6.0, the declaration of these namespaces in the XML extract is optional. It can be omitted if the XML structure has no naming collisions without namespaces. To declare a namespace, a prefix has to be defined for the URI of the namespace. This prefix is then used in the XPath expressions to identify the correct XML nodes.

You can take a look at the supplied Jedox Integrator sample “sampleXML”. The XML extract “ExchangeRate_Extract_withNamespace” delivers the following screen:

Under “Advanced Settings” you have the option to use caching for none (default), memory, or disk. See Caching in Extracts for more information.