The project deals with the basic MapReduce program using the Apache Hadoop framework in windows computers. The Hadoop framework installed is Pseudo Distributed (Single node).
This tutorial deals with running the MapReduce program on windows. Hadoop single node framework and JDK with eclipse are already installed.
Java JDK version "1.8.0_291" and Hadoop version "3.3.0" are used here; the operations resemble similar to other versions also.
Initially, open eclipse IDE and create a new project. Here the project name is map_reduce_example.
Now use configure build path in your project and add the following external Hadoop jar files to the project as shown below. (1st file in hadoop-3.3.0->share->hadoop->common, 2nd jar file in hadoop-3.3.0->share->hadoop->mapreduce)
After successfully adding the jar files, the eclipse will show items in referenced libraries, as shown below.
Next, add source code for the word-count program.
copy the below code in the WordCount.java class
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
After adding source code, create a jar file of map_reduce_example using the export option in eclipse IDE.
Download mapreduce.zip attached here to get the WordCount.jar and input files used in this tutorial.
Submitted by Lokesh Madhav S (2208loki)
Download packets of source code on Coders Packet
Comments