May 16, 2010

Install Hadoop and Hive on Ubuntu Lucid Lynx

If you've got a need to do some map reduce work and decide to go with Hadoop and Hive, here's a brief tutorial on how to get it installed. This is geared more towards local development work than a standalone server so be careful to use best practices if you decide to deploy this live. This tutorial assumes you're running Ubuntu Lucid Lynx but it could work for other Debian based distros as well. Read on to get started!


Step 1: Enable multiverse repo and get packages
The first thing we need to do is make sure we've got multiverse repos installed. Using your favorite editor (vi) add these lines to your etc/apt/sources.list:

deb http://us.archive.ubuntu.com/ubuntu/ lucid multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ lucid multiverse
deb http://us.archive.ubuntu.com/ubuntu/ lucid-updates multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ lucid-updates multiverse

With that done, go ahead and update your copy and install the subversion, java, and ant packages you'll need to do the install.

sudo apt-get update
sudo apt-get dist-ugprade
sudo apt-get install openjdk-6-jre ant subversion


Step 2: Get Hadoop
The next thing we'll do is grab hadoop. Be sure to get the latest version. For this tutorial we're using 0.20.2

wget http://mirror.its.uidaho.edu/pub/apache/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
We'll move this to /usr/local, untar it, and then rename it. Use any alternate techniques you like here.. (e.g. symlinks, different directories, etc) there's no magic in this step

sudo tar xvzf hadoop-0.20.2.tar.gz
sudo mv hadoop-0.20.2 hadoop
cd hadoop

Once you've extracted it and moved into the directory, find the JAVA_HOME line in the environment script and uncomment it as so

sudo vi conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/

Then type

sudo ant
Finally, when ant is done doing it's thing, remove the build directory

sudo rm -rf /usr/local/hadoop/build


Step 3. Get Hive

From /usr/local let's go ahead and checkout hive using subversion and then build it:

sudo svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive
cd hive
sudo ant package

By default hive uses a directory called /user/hive/warehouse You can change that if you like, but for simplicity, we'll just go ahead and create it instead.

sudo mkdir -p /user/hive/warehouse


Step 4: Add the ingredients to your PATH
I'm running hive as root in development but you can add this PATH statement to whatever user has permissions.

export PATH=$PATH:/usr/src/hive/build/dist/bin/
export PATH=$PATH:/usr/src/hive/build/dist/lib/
export PATH=$PATH:/usr/local/hadoop/bin

Once done, log out and log back in (so your path takes hold) and then as root you can launch hive using this command:

hive --service hiveserver

If you get an error about hadoop not being found, make sure you've renamed your hadoop-0.20.2 folder to just hadoop (or used symlinks or whatever)

6 comments:

  1. This is awesome! Thanks a lot! I wish I had found your post earlier -- it would've saved me so much time!

    ReplyDelete
  2. apt-get install openjdk-6-jdk for proper use of tools.jar. otherwise build fails:
    'missing javac'

    ReplyDelete
  3. On the step 3 i have a problem.
    When i type : "sudo svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive"
    it tells me : svn: URL '.........../trunk' doesn't exist.

    Any solution guys or girls ?
    Please...
    Thanks

    ReplyDelete
  4. http://mirror.its.uidaho.edu/pub/apache/hive/stable/

    ReplyDelete
  5. Try without 'hadoop' in the path and it should work:
    sudo svn co http://svn.apache.org/repos/asf/hive/trunk hive

    ReplyDelete
  6. For those who receive a 'JAVA_HOME is not defined correctly' error when running 'sudo ant package': (1) check you have JDK installed (using apt-get), (2) ensure JAVA_HOME is set correctly to the JDK and your PATH has $JAVA_HOME/bin in it, and (3) you may need to pass your $JAVA_HOME to sudo using this command: 'sudo env JAVA_HOME=$JAVA_HOME ant package'

    ReplyDelete