Relational Load

image_pdfimage_print

With the relational load, data is written to a relational database system. The relational load writes data with batch inserts, which groups statements to limit round trips to the database. The bulk size is by default 1000.

Main settings
Data source Extract or transform containing the data to be written.
Tree format If the data source is a tree, the hierarchy format must be entered. See Overview of Tree Formats.
Target connection Connection to relational database. Systems currently supported are Apache Derby, H2, HSQLDB, MS Access, MS SQL Server, My SQL, PostgreSQL, and Oracle.
Schema Not all relational database systems support schemas (e.g. My SQL).
Table name If the database table does not yet exist in the target system, it will automatically be created during the load.
Mode

 Note: in the table below, CREATE, INSERT, etc. refer to the instructions to the relational databases and not to the load modes. These were selected with a view to achieving the greatest possible analogy to the cube load.

add CREATE the database table if it is not yet present. INSERT source data.
create DROP the database table if already present and CREATE the database table. INSERT source data.
delete In the target table, DELETE the rows that are present in the source data. For columns not available in the source, no filter is applied for deletion in the target table (i.e., all rows are deleted independent of the value in this column. This makes the deletion faster).
insert CREATE the database table if it is not yet present. INSERT source data.
update CREATE the database table if not yet present. Relational DELETE database table. INSERT source data.
Advanced settings

The Aggregation parameter defines the aggregation behavior during the data load.

Checked: the source data and existing data in the target table of the relational database will be aggregated according to the defined aggregation settings (see below). Source rows with identical keys are aggregated, and source rows that are already present in the target table are aggregated and updated during the load. This may be required if either the source data is not unique with respect to the key columns or if the load is done in add/insert mode to an already filled table.

If the primary keys of the relational table match the key columns of the aggregation settings, no duplicate entries for a primary key can occur.
Note that in this case a temporary table is created in the relational database system. Loading is significantly slower in this case.

Not checked: the source data is simply inserted to the target table of the relational database without a subsequent aggregation step. This option is appropriate when no values are already in the target table for these key columns and the source data is aggregated according to these key columns. Otherwise, depending on the primary keys in the database table, the loading might fail due to an integrity constraint violation with duplicate entries for a primary key. Loading is significantly quicker in this case.

If aggregation is active, the aggregation settings define the aggregation level for the columns in the source.

Each field with update mode “key” is a key for the aggregation (not necessarily a primary key in the database table). For the non-key fields, the aggregation function can be set in the update mode. Possible values are: avg, count, first, last, max, min, sum. With update uode “last”, a relational UPDATE can be achieved.

There is a slight difference between the load modes “add” and “insert”: if a numerical column is not specified in the aggregation settings, the default update mode is different. In case of “add”, it is “sum”; in case of “insert”, it is “last”. This makes the relational load modes compatible with the behavior of Cube Load , especially with drillthrough.

Notes
  • In order to determine the data types of the columns and other settings of the database table, the table must first be created directly in the database system.
  • It is not possible to run several relational loads for the database table in parallel jobs.
  • Load type RelationalSQL should be used for stored procedures, which change data.
Example:

Source:

   

Existing relational Table:

Customer:
A
B
B
C

Value:
10
20
30
40

 

Customer:
C

Value:
5

Results:

Mode ADD, Aggregation=true
(Customer: Key, Update Mode: sum)

   


Mode ADD, Aggregation=false

 

Customer:
A
B
C

Value:
10
50
45

 

Customer:
A
B
B
C
C

Value:
10
20
30
40
5


Mode ADD, Aggregation=true
(Customer: Key, Update Mode: last)

   

 


Mode UPDATE, Aggregation=false

 

Customer:
A
B
C

Value:
10
30
40

 

Customer:
A
B
B
C

Value:
10
20
30
40


Mode UPDATE, Aggregation=true
(Customer: Key, Update Mode: sum)

   

Customer:
A
B
C

Value:
10
50
40

 
               
image_pdfimage_print
Was this post helpful?
NoYes (No Ratings Yet)
Loading...