Best Practices for "Use Caching" Option

When you have a large project, the ETL Monitor helps you find the most time-consuming tasks in the Integrator Job. You can then use caching on this task to quicken the process.

To find out the duration of each task, look at the "Execution Runtime Details" Table. To view this table, right-click on the project name in the Integrator Monitor and select Show execution runtime details, as shown below.

This table provides the complete detail of the current Job in a list format. You can sort the items by "Duration" or "Input calls".

If a long-running Transform or Extract has multiple Input calls and Output Calls, it is recommended to set up the use caching option for this item. For example, if in a project both the Input calls and Output calls of an item are larger or equal to five ( >=5) and its "Duration" is one of the most time-consuming items of the list, set up the "use caching" option for this item. With the next Job run, you can notice the optimization effects with the new figures.

Note that caching is beneficial if both the "Input calls" and "Output calls" numbers show great numbers. If the Input calls is showing 1 and the output calls is showing a large number, caching of the transform only reduces the multiple processing of the transform but not its underlying sources.

When to use caching

When you check the "Execution Runtime Results" Table and the output call number of a Job with an Extract or Transform is smaller or equal to one ( <=1), you do not need to select the "use caching" option.

Furthermore, it is important to know that the Extract/ Transform is always expected to output the same data (except when there are different variable values). If there is a load between those calls that alters that component's output, caching is not feasible.

Also, note that not every caching results in better performance. If the Transform/ Extract contains different project variables, the "use caching" flag can drastically slow down the execution time. However, you can immediately detect this in the "Execution Runtime Detail" Table. Furthermore, it is important to note that the Extract/Transform is expected to always output the same data (except when there are different variable values.) If there is a load between those calls that alters that component's output, caching is not feasible.

Factors that show you need to use caching:

  • High number of output calls

  • High duration of the component

  • A low number of processed output rows (if the previous two factors apply)

  • The output data is the same in all output calls (except for different variable values)

Further information

  • If the different output calls of the source are processed with different variable settings, then manual caching does not help. Even if you set the cache, those calls will not use it, as this may lead to different results.

  • For Extracts from a remote system, when the processing of the source system takes a long time, it is better to use manual caching. For example, use manual caching in a Relational database with complex SQL statements and/ or a slow network.

  • When an Extract is used in a Transform, and they both have the same number of output calls, it is better to set the manual caching on the Transform. If the Transform itself is used in another Transform with the same number of output calls as the Extract and the first Transform, then set the manual caching on the second Transform.

  • Sources that are already automatically cached should not additionally be cached manually.

  • After the end of each execution, all caches are cleared.

Updated July 8, 2024