The primary goal of this tutorial is the installation of Cloudera Manager in a Virtual Machine to administrate a Hadoop pseudo-cluster in your own machine.
Subsequently some Hadoop bioinformatics tools (e.g. Pydoop, Seal) are installed in the virtual cluster, while a few test runs are executed.
The tutorial is divided in the following steps:
Download and install Oracle's Virtual Box.
In our case-study, we installed the Ubuntu 12.04 LTS version of the VirtualBox, to run on a Sony Vaio laptop.
To install the Cloudera Manager for Hadoop, just follow this simple steps:
If VirtualBox complains about:
VT-x features locked or unavailable in MSR. (VERR_VMX_MSR_LOCKED_OR_DISABLED)
then you have to enable the virtualization settings in the BIOS.
Test the Hadoop installation with the famous WordCount example.
WordCount.javaby copying the source code.
javac -classpath /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.0.jar:/usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.0.0.jar -d wordcount_classes WordCount.java
jar -cvf wordcount.jar -C wordcount_classes/ .
hadoop fs -put big.txt /user/cloudera/wordcount/input
hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output
java -version. The Oracle (Sun) version is recommended: it should be included in the VirtualBox image.
python –version(version 2.7 is recommended).
In case you need to use Python 2.6, you'll have to install the backported
Install them with:
pip install importlib --user pip install argparse --user
git clone https://github.com/crs4/pydoop.git
python setup.py build
pip install pydoop
Install both Java libraries Apache Ant:
sudo yum install ant
sudo yum install ant-apache-regexp
./configure make make check sudo make install
./bootstrap.sh ./b2 --with-python
git clone http://git.code.sf.net/p/hadoop-bam/code hadoop-bam-code
git clone git://git.code.sf.net/p/biodoop-seal/code biodoop-seal-code
export HADOOP_BAM=<path to hadoop bam directory> python setup.py build
Now you are finally ready to investigate the potentialities of Hadoop!