TableJoin Transform
The TableJoin type joins two data sources via one or more key fields. It’s possible to join different types of sources with data coming from different source systems, e.g. flat files and relational databases.
Join type | full: joins all records from both left- and right-hand tables, regardless of whether the join condition is met or not. Fields that do not have matching records in either table are filled with NULL. inner: joins data sets from the left- and right-hand tables precisely when all the criteria entered are fulfilled. left outer: declares the left-hand table the superior table. Even lines with no counterpart in the joined table will appear in the results table. Fields that are not present in the right-hand table are filled with NULL. Join type right outer can be achieved by using "left outer" and inverting the order of sources. cross: the Cartesian product of both sources; equivalent to an inner join with no key columns. |
Data sources | The two data sources to be joined. Data sources consist of all extracts and transforms associated with the current Integrator project. |
Tree formats | When using a tree-based source, you must select a tree format. |
Join table
The mapping keys for the respective data sources are defined in the Join table. The tables are joined via these keys. The number of join keys in the left and right data sources must be identical. The order of the two data sources is only important for the join type "left outer".
You can choose fields in the two related sources directly, which is especially important for sources containing fields with identical names.
The join condition can be:
eq (default): equal (=)
ne: not equal (<>)
ge: greater equal (>=)
gt: greater (>)
le: lower equal (>=)
lt: lower (<)
Note for join operators le, lt, ge, gt: numerical sorting is only done if the value type of the sorted column is numeric, e.g. double/integer; otherwise, the default behavior is a lexical/string-based sorting. Example:
-
Lexical sorting: 12 < 5 < 5.0 < 6
-
Numerical sorting: 5 < 6 < 12
Note: when the input comes dynamically from the source or transform but the "Field name" is left empty, the name of the source column/ function is used for the Field name. When the target is manually defined as constant, then the name "constant" is chosen instead of the value from the "Input" field. Adding more than one constant column results in error, because there would be two columns with the name "constant."
Other settings
Persisted join | not set: the right join source is cached in memory during processing. This leads in general to a higher performance. For an inner join the source with less rows and columns should be chosen as right source. set: both join sources are stored in an embedded database during processing. It should only be used, if both sources have a huge number of columns and rows which would lead to memory problems. For join conditions different to "eq" the Persisted Join variant is used independent of this setting. |
Right source is unique | not set: the join is calculated completely. set: in case of a non-persisted join, this option can be set if the right join source only contains each key value only once. This should lead to an increase of performance. |
All output columns | not set (default): only the columns of the left source are put to the output structure. A warning message is raised in this case. set: both columns are put to the output structure, the corresponding columns of the right source are suffixed with an index. |
Use caching
If caching is activated, the complete output of the extract is temporarily stored during the first call of the extract, using an internal H2 database. Subsequent calls of the extract read directly from the cache without connecting to the underlying source system of the extract. If the extract or the underlying connection contains variables, a separate cache is build for different values of these variables.
See Caching in Extracts and Transforms for more information.
Updated April 14, 2025