TableAggregation Transform

This transform aggregates the data according to a set of key columns and different aggregation functions.

Settings

Data source Can be any extract or transform in the Integrator project. The data source will determine the input options for the key columns:

Tree format Only necessary when the source is tree-based. It defines how the tree is converted into a table.
Target Defines the columns to be used as keys for the aggregation. For example, if you choose "Regions" as a key column, all rows presenting the same value in this column in the source table (e.g. Germany) will be aggregated into a single row in the new table.
Field Name

Name by which the column can be referenced further in the Jedox Integrator process. If left blank, the original names of the source columns will be used. 1

You may choose more than one key column, e.g. "Regions" and "Years". In this case, all rows presenting the same values in both key columns will be aggregated into a single row in the new table (for example, Germany 2018).

Input Either a reference to a column of the source or to a function, or a constant value. To switch between the two, use the toggle button on the right side of the Input field:
Measures

In the Measure field, you can define a different name for the header of the aggregation columns. If this field is left empty, the function name will be the header of the aggregation column.2

Specify an aggregation function in the Aggregation field, and define the input value in the Input field. Type can be text, numeric (double), or integer.

For instance, you can choose the "sum" function and define "Value" as input. In this case, all rows having an identical value in the key column will be grouped into a single row, and their values will be summed and presented in a "sum" column in the new table. See the example below for a more detailed explanation.

Column include pattern Regular expression for column names to be included as output columns of the transform.
Column exclude pattern Regular expression for column names to be excluded as output columns of the transform.
Measure include pattern Regular expression for measures to be included as output of the transform.
Measure exclude pattern Regular expression for measures to be excluded as output of the transform.
Measure aggregation The aggregation function used for all measures that are added dynamically via the Measure include or exclude pattern.
Concatenation separator You can also create output columns through regular expressions. Concatenation options are
  • semicolon ;
  • comma (default) ,
  • Tab \t
  • Space #space
Keep source order If set, the order of output rows respects the order in the source with first appearance of a particular key. If not set, the order is arbitrary but the performance might be better.
Use caching If caching is activated, the complete output of the extract or transform is temporarily stored during the first call. Subsequent calls of the extract / transform read directly from the cache without connecting to the underlying source system.

Some advantages to using caching include:

  • Performance gain if the retrieval of data from the source system is time consuming, e.g. in case of complex SQL queries or remote systems with a slow network connection
  • Improved consistency if the data in the source system can change during the execution of an Integrator job

One disadvantage to caching is increased overhead due to storage of the cache, especially in case of huge result sets for the extract.

See Caching in Extracts and Transforms for more information.

Example

Consider you have the data below (just a sample here):

You wish to aggregate the data using the "Regions" and "Years" columns as keys, and sum their values. That means you need the "sum" function.

For this, select the key columns in the "Input" field in the Target table. Next, in the Measures table, select the "sum" function in the "Aggregation" field, and define "Value" as the "Input".

Save the changes.

In the data preview, you can see that all rows showing a same country and a same year in the source table were grouped into a single row in the new table, and their values were summed in the "Sum" column. Check "Germany", for example:

The same procedure is valid for all aggregation functions, which are presented in the table below.

Aggregation functions

The following aggregation functions are available for transforms TableAggregation, TableNormalization, and TableDenormalization:

sum Sum
min Minimum
max Maximum
avg Average
first The first data entry for one target position
last The last data entry for one target position
count Number of data entries written on one target position
count_distinct
Number of distinct data entries written on one target position
group_concat Concatenates all data entries with separator “,” while maintaining order of source rows
concat_distinct Concatenates all distinct data entries with separator “,” (i.e. remove duplicate values); results are in ascending order
selectivity Estimates the selectivity (0-100) of a value. The value is defined as (100 * distinctCount / rowCount).
var_samp The sample variance (square of the sample standard deviation)
stddev_samp The sample standard deviation
var_pop The population variance (square of the population standard deviation)
stddev_pop The population standard deviation
none No aggregation occurs; the last data set is written (overwriting).

Special case: distinct rows

The distinct rows of a source, i.e. removing of all duplicate rows, can be achieved with this transform by setting "Column include pattern" to "." (i.e. the regular expression for accept all) and no measures. In order to maintain the order of rows the option "Keep source order" has to be set.


Updated February 26, 2025