Caching in Extracts and Transforms

Caching can be useful if an extract is accessed several times during the run time of a job, including all of its sub-jobs and loads. This can occur if the job is used in different transforms, lookup functions, or loads, or when looping over it in a TableLoop transform or a Loop job.

If caching is activated, the complete output of the extract is temporarily stored during the first call of the extract, using an internal H2 database. Subsequent calls of the extract read directly from the cache without connecting to the underlying source system of the extract. If the extract or the underlying connection contain variables, a separate cache is build for different values of these variables.

Caching can be enabled in settings of many extracts and transforms. For example, the screenshot below shows caching enabled at the bottom of Settings for an Excel extract:

Advanced settings screenshot

Some advantages to using caching include:

  • Performance gain if the retrieval of data from the source system is time consuming, e.g. in case of complex SQL queries or remote systems with a slow network connection
  • Improved consistency if the data in the source system can change during the execution of an Integrator job

One disadvantage to caching is increased overhead due to storage of the cache, especially in case of huge result sets for the extract.

Implicit caching of sources

In some cases, caching is required to calculate the result of the source. This implicit caching occurs independently of the manual setting "Use caching." In these cases, an additional manual setting has no effect.

TableAggregation transform
TableNormalization transform
TableDenormalization transform
If an aggregate is set which is different to "none" (e.g. "sum").
TableView transform If sorting is defined.
TableJoin transform If "Persisted Join" is set.
File extract If a SQL query is defined.

You can see in the execution log whenever a source is cached. An information message starting with "Explicit caching of source ..." or "Caching of source..." (in case of implicit caching) is raised in the log.

Updated March 27, 2024