Download and install Cloudera –> link
1. Download Hadoop Libraries
You will require hadoop libraries to run WordCount.
First, check your Hadoop version. Go to Terminal -> run command
$ hdfs version
$ hadoop version
In this case, Hadoop version is 2.6.0, so we will need to download libraries exactly as this version.
You can download Hadoop library from this website http://mirrors.ibiblio.org/apache/hadoop/common/
Choose the correct version and file
After that, extract the file with this command
tar –xvzf filename.tar.gz
You will get the file as below, and you will need it to import to Eclipse later
2. Eclipse Project Setup
By default, Eclipse is available on your VM machine
Run the Eclipse IDE and then create Java Project
Setup project name and JRE
Create Package name
Create WordCount class
3. WordCount Source Code and Library Setup
WordCount maps (extract) words from an input source and reduces the result, return a count of each word. You can find source code below or on the internet
Add all required libraries as below
share/hadoop/common/hadoop-common-lib-*.jar (all files)
4. Running WordCount
Before you run, you must create input and output locations in HDFS. Use the following commands to create input directory /user/cloudera/wordcount/input in HDFS:
$ sudo su hdfs
$ hadoop fs -mkdir /user/cloudera
$ hadoop fs -chown cloudera /user/cloudera
$ exit$ sudo su cloudera
$ hadoop fs -mkdir /user/cloudera/wordcount /user/cloudera/wordcount/input
Create sample text files to use as input, and move them to the /user/cloudera/wordcount/input directory in HDFS. You can use any files you choose; for convenience, the following shell commands create a few small input files for illustrative purposes.
$ echo “Hadoop is an elephant” > file0
$ echo “Hadoop is as yellow as can be” > file1
$ echo “Oh what a yellow fellow is Hadoop” > file2
$ hadoop fs -put file* /user/cloudera/wordcount/input
Compile WordCount class.
Check your project Run Configuration
Make sure you check your project name and main class
In Eclipse, right click on your class file -> Choose Export
Select Java -> Runnable JAR file
Select your project configuration and export destination. After that, click Finish.
To run the WordCount application from JAR file, passing the paths to the input and output directories in HDFS.
$ hadoop jar WordCount.jar /user/cloudera/wordcount/input /user/cloudera/wordcount/output
You can see some output result as below
You also can browse output file on the web browser
To download the output file to text file run below command
To clear the output directory run below command
Author: Sovichea Cheth
Remark: This post made during Big Data Project (CS522) at MUM