Rules Engine in Jedox 2018.1

image_pdfimage_print

OLAP engine

Jedox OLAP in 7.1 and 2018.1 uses Data Driven Engine (DDE) architecture more extensively. OLAP has two built-in versions of DDE – one for CPU and one for GPU that support slightly fewer scenarios. DDEs don’t understand model metadata like dimensions or rules, and therefore they use a calculation planner to translate metadata into a plan.

Calculation planner

The calculation planner is a component that analyzes queries (Excel PALO.DATA / DATAC / DATAV formulas, DFILTER, Jedox View) and OLAP metadata (dimensions, cubes, and rules) and builds a calculation plan.

The calculation plan is a hierarchical graph of objects (nodes) that are later instantiated by the DDE to process source data stored in OLAP cubes in a uniform way. Each node has a simple specific function but can transform a large amount of data effectively. For optimal performance, it is important that the amount of data flowing between these nodes is not extremely large (effect of data sparsity).

While building the plan, the calculation planner does some optimization to reduce DDE workload. For example, it replaces a node doing the addition of two constants and replaces it with another constant or replaces aggregation of one billion constant rule-calculated cells with an aggregation of a single constant using multiplication. It also analyzes conditions of IF expressions and, in the that case a constant condition is used, the calculation planner uses only true or false expressions as an IF result.

OLAP then decides if the plan should be processed by the CPU or GPU engine, based on estimated execution time on each of them. Processing of a single calculation plan can also be distributed between the CPU and GPU engine.

Below is an example of a calculation plan for two rules:

  • ['EUR'] = ['LC'] * PALO.DATA("","XRATE","LC","EUR",!'Date')
  • ['Revenue'] = ['Price'] * ['Units']

Performance factors

Processing of each plan depends mainly on two factors:

  • Plan complexity (the number of plan nodes): based on the query (single cell or whole cube, cell row, or multidimensional slice), number of rules and consolidations involved in query evaluation, and level of subquery fragmentation (caused by recursion, complex IF expressions, and PALO.DATA formulas using mapping).
  • Amount of data flowing between plan nodes: based on values loaded into cubes (primary data) and the types of plan nodes being generated (secondary data)
Amount of processed data – primary and secondary

Primary data are those loaded into OLAP cubes using Integrator Cube Load or entered manually, e.g. during the planning process. Secondary data are those produced by plan nodes. The amount of secondary data can be several orders of magnitude higher and usually depends on structure sizes, i.e., the number of dimensions and their sizes.

To prevent the creation of large plans, the calculation planner won’t create plans for recursive/chained rules when the current plan’s size reaches 64 nodes. These unexpanded nodes are processed ad-hoc. This 64-node ceiling limits GPU usage and will be changed in future versions.

Value types

DDE implementation for CPU operates with these data types:

  • Null
  • Numeric
  • String
  • Errors

For GPU, only null and numeric nodes for primary and secondary data are supported.

Null

Usually cheap to process. Primarily, null represents an empty cell. Secondarily, it can be the product of

  • a consolidated cell with no source data
  • the result of multiplication/division when one of the operands is null
  • rule keyword null
  • a result of rule comparison operators that is not true; e.g. 1 > 2 (== null)
Numeric

Primary numeric data can come from base cells or from rule literals like 3.14159, -1 or  0.0. Secondary numeric data can be produced by math operations, consolidations, or functions like ['A']+['B'], LEN("Hello").

String

Primary data or string literals (e.g. Not a Value, "").
Secondary string data comes as a function of results or rule parameter values – CONCATENATE("Hello","World"), !'Year'.

String values are typically used in attribute cubes and rights cubes. String values produced by rules are not cached in OLAP. This can have a negative impact on performance when reports that use such values are repeatedly refreshed, or when Integrator jobs repeatedly extract such values.

Errors

Error values usually come from rules functions when they are called with invalid parameters. Errors typically occur when using invalid element names in recursive rules of PALO.DATA functions.c

 

image_pdfimage_print