Hadoop client

From MetaCentrum
Jump to: navigation, search

(Česká verze)


This is a specific documentation describing the setting of local Hadoop client.

Primary usage of Hadoop cluster is through frontend hador.ics.muni.cz, see Hadoop documentation.

Manual installation

Configurations can be obtained from Hadoop frontend hador.ics.muni.cz. They are automatically generated by puppet tool and so they can differ a little from time to time.

Kerberos client

Correctly configurated Kerberos client for realm META and ICS.MUNI.CZ is necessary:

  1. /etc/krb5.conf (for example from hador.ics.muni.cz, see also this page)
  2. correct local time (it is recommended to use ntp)


We recommend to install identical SW versions (use Cloudera distribution and local mirror with tested versions for MetaCentrum). It is also possible to use original repository of Cloudera (http://archive.cloudera.com/cdh5/).

Debian 7/wheezy:

echo 'deb [arch=amd64] http://scientific.zcu.cz/repos/hadoop/cdh5/debian/wheezy/amd64/cdh wheezy-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list
apt-key adv --fetch-key http://scientific.zcu.cz/repos/hadoop/archive.key
apt-get update

Debian 8/jessie:

echo 'deb [arch=amd64] http://scientific.zcu.cz/repos/hadoop/cdh5/debian/jessie/amd64/cdh jessie-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list
apt-key adv --fetch-key http://scientific.zcu.cz/repos/hadoop/archive.key
apt-get update


Supported Java versions are: OpenJDK 7, OpenJDK 8, Oracle 7, Oracle 8

Example of Java installation for Debian 7/weezy and Debian 8/jessie:

apt-get install openjdk-7-jre-headless


  • installation:
apt-get install hadoop-client
  • copy configurations from frontend:


  • installation:
apt-get install hive
  • copy configurations from frontend:


  • installation:
apt-get install hbase
mkdir -p /var/lib/hbase/local/jars
chown -R hbase:hbase /var/lib/hbase/local
  • copy configurations from frontend:


It is also necessary to set #Hadoop, because it is used in mode with YARN.

  • installation:
apt-get install spark-python
  • copy configurations from frontend:

Instead of files in /etc/profile.d/ variables can be set in ~/.bashrc:

export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:$LD_LIBRARY_PATH"


  • installation:
apt-get install pig
apt-get install pig-udf-datafu
  • copy configurations from frontend:

Instead of files in /etc/profile.d/ variables can be set in ~/.bashrc:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce


Configured environment for cluster hador usage in MetaCentrum (see docker hub).

# update image
docker pull valtri/docker-hadoop-frontend-debian

# launching with login shell (because of environment variables)
docker run -it --name hadoop_frontend valtri/docker-hadoop-frontend-debian /bin/bash -l