Hadoop Package for Jedox Integrator

image_pdfimage_print

The Hadoop package is an add-on to Jedox Integrator that adds connection capabilities to Apache Hadoop, a framework for distributed storage and processing of very large data sets. It offers read and write access to the Hadoop Distributed File System (HDFS) and to Apache Hive, the data warehouse infrastructure built on top of Hadoop.

Connection type “Hdfs” parameters

This connection type can be used in extracts and loads of type “File”.

HDFS URL

The address of the namenode server of the HDFS. It starts with the protocol prefix “hdfs://” followed by the hostname and the port.
Example: hdfs://12.34.56.78:8020

File name

The filename with absolute path on the corresponding HDFS.
Example: /user/hue/BikerCustomerRegions.csv

Header (true or false)

If set, the entries of the first line are used as column headers.

Data delimiter

Separators between the columns in the text file, e.g. “,”.

\t – data delimiter for tab
#space – data delimiter for blank value (” “)

Enclosure character

The enclosure character of the columns. Possible enclosure characters are “, ‘ or none.

Encoding

The most prevalent character encodings are “UTF-8” (default), “ASCII”, “latin1” (Windows standard). A list of all character codes can be found at http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

From this list you can also manually enter a character code into the field ‘Encoding’.

Connection type “Hive” parameters

This connection type can be used in extracts of type “Relational” and “RelationalTable” and in loads of type “RelationalSQL”. 

Host

The host name (DNS name) or the IP address of the server on which the database is located.

Port

The TCP/IP port number used by the database.

Username

User name for the connection to the database.

Password

Password for the connection to the database.

Database

Name, schema, or instance of the relational database.

Supported Distributions

There is a package available for the Hadoop distribution of Hortonworks Data Platform (HDP).
http://hortonworks.com/hdp/
Supported Version: HDP 2.3

It can be downloaded from here.

If you are interested in other distributions or other versions you can contact the Jedox support.

Note: by default, HDFS runs in non-secure mode, in which no actual authentication is required. The secure mode, in which each user and service has to be authenticated by Kerberos, is currently not supported in the Hadoop package for Jedox Integrator. 

image_pdfimage_print
Was this post helpful?
NoYes (-1 rating, 1 votes)
Loading...