Download and install Cloudera –> link

1. Download Hadoop Libraries

You will require hadoop libraries to run WordCount.

First, check your Hadoop version. Go to Terminal -> run command

$ hdfs version

Or

$ hadoop version

In this case, Hadoop version is 2.6.0, so we will need to download libraries exactly as this version.

You can download Hadoop library from this website  http://mirrors.ibiblio.org/apache/hadoop/common/

Choose the correct version and file

After that, extract the file with this command

tar –xvzf filename.tar.gz

You will get the file as below, and you will need it to import to Eclipse later

2. Eclipse Project Setup

By default, Eclipse is available on your VM machine

Run the Eclipse IDE and then create Java Project

Setup project name and JRE

Create Package name

Create WordCount class

3. WordCount Source Code and Library Setup

WordCount maps (extract) words from an input source and reduces the result, return a count of each word. You can find source code below or on the internet

Add all required libraries as below

share/hadoop/common/hadoop-common-*.jar

share/hadoop/mapreduce/hadoop-mapreduce-client-core-*.jar

share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*.jar

share/hadoop/common/hadoop-common-lib-*.jar (all files)

4. Running WordCount

Before you run, you must create input and output locations in HDFS. Use the following commands to create input directory /user/cloudera/wordcount/input in HDFS:

$ sudo su hdfs

$ hadoop fs -mkdir /user/cloudera

$ hadoop fs -chown cloudera /user/cloudera

$ exit$ sudo su cloudera

$ hadoop fs -mkdir /user/cloudera/wordcount /user/cloudera/wordcount/input

Create sample text files to use as input, and move them to the /user/cloudera/wordcount/input directory in HDFS. You can use any files you choose; for convenience, the following shell commands create a few small input files for illustrative purposes.

$ echo “Hadoop is an elephant” > file0

$ echo “Hadoop is as yellow as can be” > file1

$ echo “Oh what a yellow fellow is Hadoop” > file2

$ hadoop fs -put file* /user/cloudera/wordcount/input

Compile WordCount class.

Check your project Run Configuration

Make sure you check your project name and main class

In Eclipse, right click on your class file -> Choose Export

Select Java -> Runnable JAR file

Select your project configuration and export destination. After that, click Finish.

To run the WordCount application from JAR file, passing the paths to the input and output directories in HDFS.

Run command

$ hadoop jar WordCount.jar /user/cloudera/wordcount/input /user/cloudera/wordcount/output

You can see some output result as below

You also can browse output file on the web browser

To download the output file to text file run below command

Output.txt

To clear the output directory run below command

Author: Sovichea Cheth

Remark: This post made during Big Data Project (CS522) at MUM

Test Run WordCount on Cloudera
It's only fair to share...Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Email this to someonePrint this page
Tagged on:                                                 

Leave a Reply

Your email address will not be published. Required fields are marked *