Hadoop Package for Jedox Integrator

image_pdfimage_print

The Hadoop package is an Add-On to Jedox Integrator which adds connection capabilities to Apache Hadoop, a framework for distributed storage and distributed processing of very large data sets. It offers Read- and Write-Access to the Hadoop Distributed File System (HDFS) and to Apache Hive, the data warehouse infrastructure built on top of Hadoop.

 

Connection type “Hdfs”

Parameters:

HDFS URL:

The address of the namenode server of the HDFS. It starts with the protocol prefix “hdfs://” followed by the hostname and the port.
Example: hdfs://12.34.56.78:8020

File name:

The filename with absolute path on the corresponding HDFS.
Example: /user/hue/BikerCustomerRegions.csv

Header:
(true or false)

If set the entries of the first line are used as column headers.

Data delimiter:

Separators between the columns in the text file, e.g. “,”.
\t you have to enter for Tab as data delimiter.
#space you have to enter for blank value (” “) as data delimiter.

Enclosure character:

The enclosure character of the columns. 
Possible enclosure characters are “, ‘ or none.

Encoding:

The most prevalent character encodings are: 
“UTF-8” (default), “ASCII”, “latin1” (Windows standard)
A list of all character codes can be found at: 
http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
From this list you can also enter manually a character code into the field ‘Encoding’.

This connection type can be used in extracts of type File and loads of type File.

 

Connection type “Hive”

Parameters:

Host:

The host name (DNS name) or the IP address of the server on which the database is located.

Port:

The TCP / IP port number which the database uses.

Username:

User name for the connection to the database.

Password:

Password for the connection to the database.

Database:

Name, schema or instance of the relational database.

This connection type can be used in extracts of type Relational and RelationalTable and in loads of type RelationalSQL. 

 

Supported Distributions

There is a package available for the Hadoop distribution of Hortonworks Data Platform (HDP).
http://hortonworks.com/hdp/
Supported Version: HDP 2.3

It can be downloaded from here.

If you are interested in other distributions or other versions you can contact the Jedox support.

Note:
– Secure-Mode in HDFS is not supported.
By default HDFS runs in non-secure mode in which no actual authentication is required. The secure mode, in which each user and service has to be authenticated by Kerberos is currently not supported in the Hadoop package for Jedox Integrator.

image_pdfimage_print
Was this post helpful?
NoYes (No Ratings Yet)
Loading...