Development of Customer-Specific ETL Components

image_pdfimage_print

Jedox Integrator (ETL) can now be enhanced with customer-defined components, such as the connection and an extract to a specific ERP system, or a transform for complex data manipulation.

A significant development effort is required to implement a new component, so you should first explore all other possibilities for adding custom functionalities to Jedox Integrator, such as:

  • Script-based jobs or functions of type Groovy and JavaScript
  • Java classes for custom functions with function type Java
  • Integration of JDBC-based databases with connection type GenericDB
  • Retrieving data from web services with the generic connection type Soap
Some basics…

An ETL project consists of different components that are linked together. Each component belongs to one of these so-called scopes: connection, extract, transform, function, load, or job. For each of these scopes, the Jedox Integrator Server is shipped with a set of standard component types, such as a connection of type File.

A functional description of the different component types can be found in the  Jedox Integrator articles in the Jedox Knowledge Base. It is possible to create new components for any of the scopes.

Jedox Integrator Server is developed entirely in Java and runs as a servlet in the Apache Tomcat servlet container. The path to the servlet is …\Jedox Suite\tomcat\webapps\etlserver (= <etlserverpath>)

The definition of an ETL project is done in a specific XML notation. It can be downloaded from the Jedox Integrator component to a file. For each component type, an XSD (an XML Schema Definition) is validated against the project.

Each execution (an ETL job, a data preview, or a component test) is performed in one separate thread. No compiling of the components is done before or during the runtime. As a first step of an execution, the configuration of the required components is read out of the ETL project XML definition.

The list of available standard components is defined in several file locations:

  • <etlserverpath>\config\standard\component.xml 
  • <etlserverpath>\config\extension\component.xml
  • <etlserverpath>\config\sap\component.xml (if SAP connectivity is installed)

Each component type has at least the following 2 parameters:

  • Name: unique name of the component (note that extracts and transforms have a common namespace).
  • Class: the Java class that implements this component.

Examples:

A component type can have additional parameters, which are used each time this component is used in an ETL project. The component can also have an attribute status=”experimental” or status=”deprecated”. It is possible to use these components, but they are not guaranteed to work. Relational connections have 2 additional attributes, jdbc and driver, which define the JDBC connection string.

Additional components have to be registered in a file also named component.xml in an arbitrarily named subdirectory of <etlserverpath>\config\. You should not change the standard component.xml file, because it will be overwritten with a new setup. You can also rename the file <etlserverpath>\config\customer\component_template.xml.

Requirements

We describe here a well-suited development environment based on Eclipse, although of course it is also possible to use other development IDEs. It allows you not only to develop but also to test the components directly from the development framework.

Implementation of an ETL component

An ETL component is basically implemented in one Java class. Depending on the scope, the class for a component must implement the methods of a specific, rather simple interface of the same name:

Scope

Interface

Connections

IConnection

Extracts

IExtract

Transforms

ITransform

Loads

ILoad

Jobs

IJob

Functions

IFunction

Each component has to read its configuration by parsing the XML document of the ETL project. This is done in the configuration phase, before the execution is processed (Method init()). During the processing of an execution, the role of a component depends on its scope.

For the implementation, you need at least the standard ETL package etlcore.jar (com.jedox.etl.core), which contains the interfaces above. The standard ETL components are part of the package etlcomponents.jar (com.jedox.etl.components). They can both be found in directory <etlserverpath>\WEB-INF\lib.

Example package

As a very basic example of an ETL component package, you can download the Maven project customdemo from CustomPackageETL. It features two ready-to-run additional ETL components:

  • A connection of type SimpleMySQL, which establishes a connection to a My SQL database
  • An extract of type SimpleSQL which performs a simple SQL Query to this database

The implementation is done in two Java classes that can be used as a template:

  • Connection type SimpleMySQL: Class SimpleSQLConnection in package jedox.etl.customdemo.connection
  • Extract type SimpleSQL: Class SimpleSQLExtract in package jedox.etl.customdemo.extract

Note that the examples are made as simple as possible to give you the main ideas on how to develop your own components.

To use this package inside of Eclipse, the following steps have to be performed:

  • Upload the required Maven dependencies from the Jedox installation to the Maven Repository by starting mvnInstallFiles.bat, which is part of the CustomPackage. It can be found in the Jedox installation directory (e.g. C:\Program Files (x86)\Jedox\Jedox Suite”)
  • Open Eclipse Workspace.
  • Import existing Maven project
  • Configure the build path of the project: add (under libraries) the external class folder <etlserverpath>\WEB-INF\lib in order to add all dependent libraries.
  • Run as “Maven Install”.
Testing

You can perform a test of your ETL components directly out of Eclipse without deploying it in the Jedox Installation. Therefore, you can use the Standalone Client, which is part of the etlcore library. The commands are similar to those of the ETL Command Line Client.

  • Put an ETL project containing your custom project in your Eclipse project folder
  • Create a new Run Configuration with Main class jedox.etl.core.run.StandaloneClient
    • Help:
      Set as argument –h to get a list of available commands.
    • Data Preview:
      g. display first 20 rows of extract MySQL_Extract in project sampleSimpleMySQL:
      -p samples\sampleSimpleMySQL -d MySQL_Extract -n 20
    • Execution of Load/Job:
      g. execute the job default in Project sampleBiker
      –p samples/sampleBiker –j default
Deployment

Once the components work inside of the Eclipse Environment, you can deploy them on your Jedox installation by performing the following steps:

  • Copy the generated .jar file from the .\target folder of your Eclipse project to the directory <etlserverpath>\WEB-INF\lib_external. Note that you should not change the .\lib folder, because it will be overwritten with a new setup.
  • Copy the directory containing the xml file from the folder .\target\config to <etlserverpath>\config\.
  • Restart the service JedoxSuiteTomcatService.
Custom components in Jedox Integrator

All additional components can also be used within the Integrator module of Jedox Web., although the editor for such components is a simple XML editor. It is currently not possible to have a graphical editor for custom components.

Normally, components can be used from the UI. However, if the components have status “experimental” or “deprecated”, you cannot create new components of this type via the Integrator module of Jedox Web.

image_pdfimage_print
Was this post helpful?
NoYes (No Ratings Yet)
Loading...