Hot questions for Using Ubuntu in mapreduce

Question:

I am trying to work through this Hadoop MapReduce Word Count example given in the book Data Analytics with Hadoop which had me setup a Hadoop pseudo-distributed development environment. So now I am trying to run a Word Count example. I downloaded the .java files, WordCount folder, from Hadoop Fundamentals. The code given in the book to start this process is:

hostname $ hadoop com.sun.tools.javac.Main WordCount.java

I run this and receive the following errors:

hadoop@gh0st-VirtualBox:/home/gh0st$ hadoop com.sun.tools.javac.Main Downloads/WordCount/WordCount.java
Downloads/WordCount/WordCount.java:32: error: cannot find symbol
        job.setMapperClass(WordMapper.class);
                           ^
  symbol:   class WordMapper
  location: class WordCount
Downloads/WordCount/WordCount.java:33: error: cannot find symbol
        job.setReducerClass(SumReducer.class);
                            ^
  symbol:   class SumReducer
  location: class WordCount
Note: Downloads/WordCount/WordCount.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2 errors

The WordMapper.java and SumReducer.java files are located in the same WordCount folder from which I'm running the WordCount.java file. I'm not sure where to start from here considering all I read regarding this. My $JAVA_HOME is /usr/lib/jvm/java-7-openjdk-amd64/. My $CLASS_PATH is $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.3.jar. I'm not sure what other information is needed to fix this issue--I'll add whatever else is needed. The following links are what I looked at and tried:

Hadoop Problems

Compilation Problems

I am using Ubuntu 14.04 within VirtualBox.


Answer:

So after some deep digging, I found the answer. The answer is in the comment in the following link: Driver Class Compilation Error

I had to compile all the files together. The new code looks like this:

hadoop com.sun.tools.javac.Main Downloads/WordCount/WordCount.java Downloads/WordCount/WordMapper.java Downloads/WordCount/SumReducer.java

Hope this helps someone!

Question:

I am running a simple count program on Hadoop. My input file is of size 4 GB. For some reason the job keeps failing with the errors:

However if I try the same code with a small input file say 100MB , it works perfectly fine. I am new at this and I can't seem to find any viable solution. My set up is Psuedo Distributed.

Do I need to make any configuration changes? I have made the standard configurations for the Psuedo distributed set up as provided by the hadoop documentation.

Any help will be highly appreciated.


Answer:

From the Error Stacktrace you have posted, the ConnectionRefused exception is for JobHistoryServer.

For Pseudo Distributed setup, no configurational changes are required to start the JobHistoryServer. Use this command to start it,

$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

Question:

I went through many questions regarding this error but couldn't find any solution that could solve my problem. Here i am implementing sentiment analysis on twitter data using Hadoop.

Main Class:

public class SentimentAnalysis extends Configured implements Tool{
private static File file;

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    Classify classify = new Classify();

    /**
     * Mapper which reads Tweets text file Store 
     * as <"Positive",1> or <"Negative",1>
     */
    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        String line = value.toString();//streaming each tweet from the text file
        if (line != null) {
            word.set(classify.classify(line)); //invoke classify class to get tweet group of each text
            output.collect(word, one);
        } else {
            word.set("Error");
            output.collect(word, one);//Key,value for Mapper
        }
    }
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
    /**
     * Count the frequency of each classified text group
     */

    @Override
    public void reduce(Text key, Iterator<IntWritable> classifiedText,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (classifiedText.hasNext()) {
            sum += classifiedText.next().get(); //Sum the frequency
        }
        output.collect(key, new IntWritable(sum));
    }
}
public static class Classify {
    String[] categories;
    @SuppressWarnings("rawtypes")
    LMClassifier lmc;

    /**
     * Constructor loading serialized object created by Model class to local
     * LMClassifer of this class
     */
    @SuppressWarnings("rawtypes")
    public Classify() {
        try {

            lmc = (LMClassifier) AbstractExternalizable.readObject(file);
            categories = lmc.categories();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * Classify whether the text is positive or negative based on Model object
     * 
     * @param text
     * @return classified group i.e either positive or negative
     */
    public String classify(String text) {
        ConditionalClassification classification = lmc.classify(text);
        return classification.bestCategory();
    }
}

public static void main(String[] args) throws Exception {
    int ret = ToolRunner.run(new SentimentAnalysis(), args);
    System.exit(ret);
}

@Override
public int run(String[] args) throws Exception {
    if(args.length < 2) {
        System.out.println("Invalid input and output directories");
        return -1;
    }
    JobConf conf = new JobConf(getConf(), SentimentAnalysis.class);
    conf.setJobName("sentimentanalysis");
    conf.setJarByClass(SentimentAnalysis.class);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);
    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(IntWritable.class);
    conf.setMapperClass(Map.class);
    //conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
    file = new File(args[2]);
    JobClient.runJob(conf);
    return 0;
}
}

Error:

[cloudera@localhost ~]$ hadoop jar Sentiment.jar SentimentAnalysis test.txt SentimentOutput classifier.txt

Test.txt contains few tweets whose sentiment need to be analysed. Classifier.txt is a encoded text file which helps the Classify (LMClassifier) class to analyse the tweets that are present in the Test.txt.

14/10/05 20:59:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/10/05 20:59:24 INFO mapred.FileInputFormat: Total input paths to process : 1
14/10/05 20:59:24 INFO mapred.JobClient: Running job: job_201410041909_0035
14/10/05 20:59:25 INFO mapred.JobClient:  map 0% reduce 0%
14/10/05 20:59:41 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000000_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 20:59:41 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 20:59:52 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000000_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 20:59:53 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000001_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 21:00:04 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000000_2, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 21:00:04 INFO mapred.JobClient: Task Id : attempt_201410041909_0035_m_000001_2, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
14/10/05 21:00:19 INFO mapred.JobClient: Job complete: job_201410041909_0035
14/10/05 21:00:19 INFO mapred.JobClient: Counters: 7
14/10/05 21:00:19 INFO mapred.JobClient:   Job Counters 
14/10/05 21:00:19 INFO mapred.JobClient:     Failed map tasks=1
14/10/05 21:00:19 INFO mapred.JobClient:     Launched map tasks=8
14/10/05 21:00:19 INFO mapred.JobClient:     Data-local map tasks=8
14/10/05 21:00:19 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=98236
14/10/05 21:00:19 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/10/05 21:00:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/10/05 21:00:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/10/05 21:00:19 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1416)
at SentimentAnalysis.run(SentimentAnalysis.java:124)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at SentimentAnalysis.main(SentimentAnalysis.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Answer:

I personally have no experience using Hadoop, however if you "look" at the stack trace it appears to be a runtime Exception in org.apache.hadoop.util.ReflectionUtils.setJobConf...

private static void setJobConf(Object theObject, Configuration conf) {
75     //If JobConf and JobConfigurable are in classpath, AND
76     //theObject is of type JobConfigurable AND
77     //conf is of type JobConf then
78     //invoke configure on theObject
79     try {
80       Class<?> jobConfClass = 
81         conf.getClassByName("org.apache.hadoop.mapred.JobConf");
82       Class<?> jobConfigurableClass = 
83         conf.getClassByName("org.apache.hadoop.mapred.JobConfigurable");
84        if (jobConfClass.isAssignableFrom(conf.getClass()) &&
85             jobConfigurableClass.isAssignableFrom(theObject.getClass())) {
86         Method configureMethod = 
87           jobConfigurableClass.getMethod("configure", jobConfClass);
88         configureMethod.invoke(theObject, conf);
89       }
90     } catch (ClassNotFoundException e) {
91       //JobConf/JobConfigurable not in classpath. no need to configure
92     } catch (Exception e) {
93       throw new RuntimeException("Error in configuring object", e);
94     }
95   }

Clearly, both JobConf and JobConfigurable classes are in the classpath (otherwise it would have fallen thru the CNFE catch block) so another Exception has occurred... it appears as though the nested exception is java.lang.reflect.InvocationTargetException which suggests there was a problem with the 'invoke' at line 88 above.

So, trying to invoke the 'configure' method on the target job instance with the config passed in is failing.

The InvocationTargetException has probably wrappered the actual causal exception so you need to somehow catch the RuntimeException at the top level then e.getCause().getCause().printStackTrace() to find out why the invocation of the configure method failed.