Hadoop client

Z MetaCentrum
Přejít na: navigace, hledání

(Česká verze)

Introduction

This is a specific documentation describing the setting of local Hadoop client.

Primary usage of Hadoop cluster is through frontend hador.ics.muni.cz, see Hadoop documentation.

Manual installation

Configurations can be obtained from Hadoop frontend hador.ics.muni.cz. They are automatically generated by puppet tool and so they can differ a little from time to time.

Kerberos client

Correctly configurated Kerberos client for realm META and ICS.MUNI.CZ is necessary:

  1. /etc/krb5.conf (for example from hador.ics.muni.cz, see also this page)
  2. correct local time (it is recommended to use ntp)

Repository

We recommend to install identical SW versions (use Cloudera distribution and local mirror with tested versions for MetaCentrum). It is also possible to use original repository of Cloudera (http://archive.cloudera.com/cdh5/).

Debian 7/wheezy:

echo 'deb [arch=amd64] http://scientific.zcu.cz/repos/hadoop/cdh5/debian/wheezy/amd64/cdh wheezy-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list
apt-key adv --fetch-key http://scientific.zcu.cz/repos/hadoop/archive.key
apt-get update

Debian 8/jessie:

echo 'deb [arch=amd64] http://scientific.zcu.cz/repos/hadoop/cdh5/debian/jessie/amd64/cdh jessie-cdh5 contrib' > /etc/apt/sources.list.d/cloudera.list
apt-key adv --fetch-key http://scientific.zcu.cz/repos/hadoop/archive.key
apt-get update

Java

Supported Java versions are: OpenJDK 7, OpenJDK 8, Oracle 7, Oracle 8

Example of Java installation for Debian 7/weezy and Debian 8/jessie:

apt-get install openjdk-7-jre-headless

Hadoop

  • installation:
apt-get install hadoop-client
  • copy configurations from frontend:
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml
/etc/hadoop/conf/yarn-site.xml
/etc/hadoop/conf/mapred-site.xml

Hive

  • installation:
apt-get install hive
  • copy configurations from frontend:
/etc/hive/conf/hive-site.xml

HBase

  • installation:
apt-get install hbase
mkdir -p /var/lib/hbase/local/jars
chown -R hbase:hbase /var/lib/hbase/local
  • copy configurations from frontend:
/etc/hbase/conf/hbase-site.xml

Spark

It is also necessary to set #Hadoop, because it is used in mode with YARN.

  • installation:
apt-get install spark-python
  • copy configurations from frontend:
/etc/spark/conf/hive-site.xml
/etc/spark/conf/spark-defaults.conf
/etc/profile.d/hadoop-spark.csh
/etc/profile.d/hadoop-spark.sh
/etc/profile.d/hadoop-spark2.csh
/etc/profile.d/hadoop-spark2.sh

Instead of files in /etc/profile.d/ variables can be set in ~/.bashrc:

export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:$LD_LIBRARY_PATH"

Pig

  • installation:
apt-get install pig
#optionally:
apt-get install pig-udf-datafu
  • copy configurations from frontend:
/etc/profile.d/hadoop-pig.sh
/etc/profile.d/hadoop-pig.csh

Instead of files in /etc/profile.d/ variables can be set in ~/.bashrc:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Docker

Configured environment for cluster hador usage in MetaCentrum (see docker hub).

# update image
docker pull valtri/docker-hadoop-frontend-debian

# launching with login shell (because of environment variables)
docker run -it --name hadoop_frontend valtri/docker-hadoop-frontend-debian /bin/bash -l