RScript Transform

image_pdfimage_print

R is a free software programming language and software environment for statistical computing. It can be used for predictive analysis but also for a variety of other use cases. Jedox now includes an RScript transform type that executes an arbitrary script in R, based on the input data from one or several Jedox Integrator sources.

RScript transform represents the linkage between Jedox Integrator and the open-source statistical software R. Thereby, it is possible to operate any statistical calculation on one or several data sources within Jedox Integrator.

The RScript transform has four components:

  • Data source
  • External packages
  • Name of result set
  • RScript
Main Settings
Data Sources An extract or transform for the corresponding Jedox Integrator project. Input is passed to RScript as a variable with the same name as the data source.
External Packages All external R-packages that are used in the RScript have to be declared here in a list. For more information, see R Installation of External Packages.
Name of result set The result of the calculation within the RScript must be a vector or a data frame, i.e., a list of vectors, factors, and/or matrices all having the same length. In order for Jedox Integrator to locate the result, the name of the variable containing the result has to be filled in here.
RScript The code for the calculation composed in the R programming language has to be implemented here. Variables created in the Jedox Integrator project can be incorporated in the RScript as well. For further information about R language, visit http://cran.r-project.org/doc/manuals/r-release/R-lang.html
Use caching
When checked (true), caching will be used; when not checked (false), caching will not be used. See Caching in Extracts and Transforms.
Example: calculating quantiles

Input data: E_Cubedata

RScript:

Result:

Notes:
  • Note that the usage of R libraries/commands with graphical output is not supported in RScript transforms.
  • For huge data volume, it is possible to allocate additional memory for the R engine. The R command memory.limit(<size>) requests a new memory limit in Mb. For example, to request a memory limit of 4000 Mb, you would enter memory.limit(4000)
  • Each line of the R script must be a complete command and subsequent lines must have the prefix “@“.
  • Automatic line completion (as in R Console) is not possible. This is especially relevant for IF and FOR statements.

Examples:

while (i<=12}) {ProductType[i] <- levels(data$Product)[1]; i<-i+1}

or

while (i<=12})
@{ProductType[i] <- levels(data$Product)[1];
@i<-i+1}
 

Each RScript row should have only one R expression, which is generally the case in R. However, unlike the R console, there is no error returned if there are several expressions that are separated by spaces. For example, the expression xxxx a<-1 yyy returns no error.

If there are several valid expressions, only the first valid expression is executed. For example, there is no error for the expression x<-1 y<-1, but a value will only be assigned to x.

Memory allocation for RScript transform

The evaluation of the RScript is done by the R engine as a separate Java process. To improve performance, you can change the maximum amount of heap memory available for RScript transform by defining the parameter “memory” in the component.xml file, located in <install_path>\tomcat\webapps\etlserver\config\standard (Windows) or <install_path>/opt/jedox/ps/tomcat-etl/webapps/etlserver/config/standard (Linux).

In the following example, the allocated memory is changed to 81920 kilobytes:

parameter “memory”
     <component name=”RScript”
          class=”com.jedox.etl.components.transform.RTransform”
          <parameter name=”memory”.81920k</parameter>
     </component>

Memory is defined in bytes. Append the number with the letter K or k to indicate kilobytes; M or m for megabytes; G or g for gigabytes. For example:

80000000 = 80000000 bytes
81920k = 81920 kilobytes
80m = 80 megabytes

Note: if no memory is set for RScript in component.xml, the Java default max heap size is used. It is system-dependent can be displayed with the following Java command (flag “MaxHeapSize”):

java -XX:+PrintFlagsFinal -version

For more information, see Changing the Maximum Memory of Tomcat Service.

Related links:
image_pdfimage_print