Caching in Extracts and Transforms


Caching can be useful if an extract is accessed several times during the run time of a job, including all of its sub-jobs and loads. This can occur if the job is used in different transforms, lookup functions, or loads, or when looping over it in a TableLoop transform or a Loop job.

If caching is activated, the complete output of the extract is temporarily stored during the first call of the extract, using an internal H2 database. Subsequent calls of the extract read directly from the cache without connecting to the underlying source system of the extract. If the extract or the underlying connection contain variables, a separate cache is build for different values of these variables.

Caching can be enabled under the advanced settings of many extracts and transforms. For example, the screenshot below shows caching enabled for an Excel extract:

Some advantages to using caching include:

  • Performance gain if the retrieval of data from the source system is time consuming, e.g. in case of complex SQL queries or remote systems with a slow network connection
  • Improved consistency if the data in the source system can change during the execution of an Integrator job

One disadvantage to caching is increased overhead due to storage of the cache, especially in case of huge result sets for the extract.