Hot questions for Using Ubuntu in hadoop

Question:

I have installed Java openjdk version "10.0.2"and Hadoop 2.9.0 successfully. All processes are running well

hadoopusr@amalendu:~$ jps
19888 NameNode
20388 DataNode
20898 NodeManager
20343 SecondaryNameNode
20539 ResourceManager
21118 Jps

But when ever i am trying to execute any command like hdfs dfs -ls / getting this warnings

hadoopusr@amalendu:~$ hdfs dfs -ls /
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.9.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
18/09/04 00:29:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Please help how to fix this. This is my ~/.bashrc file configuration

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Answer:

There is nothing you can do about these warnings, this is related to jigsaw project and strong(er) encapsulation.

Basically there is some class called sun.security.krb5.Config that is part of some "module" called java.security.jgss. This module "defines" what it exports (what others can use out of it) and to whom. This also means in plain english that this is not for the public usage - don't touch it; well hadoop did, it's part of their effort to fix this. You can report this or try to upgrade hadoop, may be this is already fixed.

Question:

I'm using perf tools on ubuntu 12.04 to profile system level performance of hadoop 2.4.1 map/reduce jobs by doing:

perf record -ag -F 100 sleep 60

My goal is to determine which processes/routines are consuming the cpu and identify candidates for optimization.

Kernel symbols are decoded correctly, but java code is not. Perf report looks something like:

Samples: 39K of event 'cycles', Event count (approx.): 11326629675790000f9f0                                                                                                                                  รข
+  10.64%           java  perf-9201.map               [.] 0x00007eff6c188127                                                                                                                                   
+  10.57%           java  perf-8988.map               [.] 0x00007f71ac7b9a29                                                                                                                                   
+   9.91%           java  perf-9077.map               [.] 0x00007fa9e92073e0                                                                                                                                   
+   9.77%           java  perf-9025.map               [.] 0x00007f849cdf41a9                                                                                                                                   
+   9.26%           java  perf-8747.map               [.] 0x00007f078c6bda82                                                                                                                                   
+   7.85%           java  perf-31343.map              [.] 0x00007f6671041cb4                                                                                                                                   
+   5.81%           java  perf-8835.map               [.] 0x00007f5df0d5afc4                                                                                                                                   
+   5.78%           java  liblzo2.so.2.0.0            [.] lzo1x_decompress                                                                                                                                     
+   2.61%           java  [kernel.kallsyms]           [k] copy_user_generic_string                                                                                                                             
+   1.58%           java  libc-2.15.so                [.] 0x000000000008ce40                                                                                                                                   
+   0.93%           java  perf-9677.map               [.] 0x00007f7c81012887                                                                                                                                   
+   0.86%        swapper  [kernel.kallsyms]           [k] intel_idle                                                                                                                                           
+   0.69%           java  libjvm.so                   [.] SpinPause                                                                                                                                            

So the question is, how do I get perf to decode the symbols for the java code?


Answer:

It turns out that the perf-$pid.map objects that show up in the perf output are associated with JIT compiled java code. In order for perf to be able to decode those symbols, the java code needs to produce /tmp/perf-$pid.map symbol map files.

There is a perf-map-agent library on github that can be used to generate the symbol map files for JITed code. With the library in place, add -agentpath:<dir>/libperfmap.so to the java command line.

To make hadoop jobs generate the symbol maps, add lines such as the following to hadoop-env.sh:

export HADOOP_JAVA_PLATFORM_OPTS="-agentpath:/usr/lib/oprofile/libjvmti_oprofile.so $HADOOP_JAVA_PLATFORM_OPTS"
export JAVA_TOOL_OPTIONS="-agentpath:/usr/lib/libperfmap.so $JAVA_TOOL_OPTIONS"

Question:

I installed Hive and Hadoop on my Ubuntu VM.

When I launch hive on the terminal I get this:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Exception in thread "main" java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap') at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:394) at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:370) at org.apache.hadoop.hive.cli.CliSessionState.(CliSessionState.java:60) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

And when I launch hiverserver2, http://localhost:10002/, which is Hive WebUI, stays inaccessible.

I already tried this.


Answer:

As @mazaneicha suggested, It is easier to do it with JDK8. I was on JDK11. So I just

Question:

ls: Call From java.net.UnknownHostException: ubuntu: ubuntu: unknown error to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

My hadoop configuration is like this.

/etc/hosts

127.0.0.1   localhost
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

core-site.xml

<property>  
<name>hadoop.tmp.dir</name>
<value>/Public/hadoop-2.7.1/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>

mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>

Is there any question?I have been browsing the web for one whole day..help me


Answer:

Try this:

1. Open /etc/hosts in terminal using command:

sudo nano /etc/hosts

2. Add this line below 127.0.0.1 line:

your-ip-address    hadoopmaster

NOTE: To find you ip address, type this command ifconfig | grep inet in terminal.

3. Change localhost to hadoopmaster in core-site.xml and mapred-site.xml

4. Stop all hadoop processes and then start again.

Question:

I am using Hadoop 1.0.3 for a 10 Desktop cluster system each having Ubuntu 12.04LTS 32 bit OS. The JDK is 7 u 75. Each machine has 2 GB RAM with core 2-duo processor.

For a research project, I need to run a hadoop job similar to "Word Count". And I need to run this operation for a big amount of dataset for example at least 1 GB in size.

I am trying hadoop's example jar hadoop-examples-1.0.3.jar to use for counting words of a input dataset. Unfortunately, I cannot run any experiment which has more than 5-6 MB input data.

For input I am using plain text formant story books from https://www.gutenberg.org. Also I used some rfcs from https://www.ietf.org. All the inputs are .txt format English writing.

My system can give proper output for a single .txt document. However, when it has more that 1 .txt files it starts to continuously giving the error:

INFO mapred.JobClient: Task Id :      attempt_XXXX, Status : FAILED
Too many fetch-failures

The dataset also working fine when I use a single node cluster. I have got some solutions from previous stackoverflow posts for example this one and this one and also some more. But none of these worked for my case. According to their suggestion I have updated my /usr/local/hadoop/conf/mapred-site.xml file as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>master:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
  <name>mapred.task.timeout</name>
  <value>1800000</value> 
</property>
<property>
  <name>mapred.reduce.slowstart.completed.maps</name>
  <value>0.9</value> 
</property>
<property>
  <name>tasktracker.http.threads</name>
  <value>90</value> 
</property>
<property>
  <name>mapred.reduce.parallel.copies</name>
  <value>10</value> 
</property>
<property>
  <name>mapred.map.tasks</name>
  <value>100</value> 
</property>
<property>
  <name>mapred.reduce.tasks</name>
  <value>7</value> 
</property>
<property>
  <name>mapred.local.dir</name>
  <value>/home/user/localdir</value> 
</property>

</configuration>

In this file I have collected the value for property: "mapred.local.dir", "mapred.map.tasks", "mapred.reduce.tasks" from michael-noll's blog. Also I have set,

export HADOOP_HEAPSIZE=4000

From conf/hadoop-env.sh file.

As I have set the environment of all the 10 machines with hadoop-1.0.3 it will be more helpful for me if someone can give me solution without changing the hadoop version.

Also I want to mention that I am a newbie in hadoop. I have found many articles about hadoop but I could fix any article as a standard for this topic. If anybody know any informative and authentic article regarding hadoop feel free to share with me.

Thanks everyone in advance.


Answer:

My problem is now solved. Actually the problem was in my network settings. Unfortunately, the Hadoop system could not locate the right machine at the time of reduce due to my faulty network settings.

The correct network settings should be:

At /etc/hosts file the following info should contain:

localhost 127.0.0.1

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

master 192.168.x.x
slave1 192.168.x.y
....

And in the file /etc/hostname

We should just mention the hostname that is written in the hosts file. For example, in master machine we should write just one word in hostname file. It is:

master

For the machine slave1 the file should contain:

slave1

Question:

cat /etc/hosts

127.0.0.1 localhost.localdomain localhost
#192.168.0.105 UG-BLR-L030.example.com UG-BLR-L030 localhost 

192.168.0.105 UG-BLR-L030 localhost.localdomain localhost

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/hadoop-data</value>
    <description>A base for other temporary directories.</description>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://UG-BLR-L030:54310</value>
    <description>The name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
  </property>
</configuration>

Whenever I try to start hadoop with this command start-dfs.sh I get the following error :

2015-05-03 15:59:45,189 INFO org.apache.hadoop.hdfs.server.namenode.DecommissionManager: Interrupted Monitor
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
    at java.lang.Thread.run(Thread.java:745)
2015-05-03 15:59:45,195 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to UG-BLR-L030/192.168.0.105:54310 : Cannot assign requested address
    at org.apache.hadoop.ipc.Server.bind(Server.java:227)
    at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:301)
    at org.apache.hadoop.ipc.Server.<init>(Server.java:1483)
    at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:545)
    at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:294)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
Caused by: java.net.BindException: Cannot assign requested address
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:463)
    at sun.nio.ch.Net.bind(Net.java:455)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.apache.hadoop.ipc.Server.bind(Server.java:225)
    ... 8 more

2015-05-03 15:59:45,196 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at UG-BLR-L030/192.168.0.105
************************************************************/

ifconfig

eth0      Link encap:Ethernet  HWaddr f0:1f:af:4a:6b:fa  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:340842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:197054 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:410705701 (410.7 MB)  TX bytes:18456910 (18.4 MB)
          Interrupt:20 Memory:f7e00000-f7e20000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:1085723 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1085723 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:136152053 (136.1 MB)  TX bytes:136152053 (136.1 MB)

wlan0     Link encap:Ethernet  HWaddr 0c:8b:fd:1d:14:ba  
          inet addr:192.168.0.105  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:873934 errors:0 dropped:0 overruns:0 frame:0
          TX packets:630943 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:919721448 (919.7 MB)  TX bytes:92919940 (92.9 MB)

Error:

ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to UG-BLR-L030/192.168.0.105:54310 : Cannot assign requested address

Why does hadoop try to connect to UG-BLR-L030/192.168.0.105:54310 instead of UG-BLR-L030:54310 or 192.168.0.105:54310


Answer:

I managed to get this to work by editing my hosts file to look like this :

127.0.0.1 UG-BLR-L030.example.com UG-BLR-L030 localhost
192.168.0.105 UG-BLR-L030.example.com UG-BLR-L030 

Question:

I am using ubuntu 14.04. CDH4.7 I am installing as per the procedure given in the link below http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_2.html The problem is I am not able to start the data node . I am getting the error as

naveensrikanthd@ubuntu:/$ for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
[sudo] password for naveensrikanthd: 
 * Starting Hadoop datanode: 
starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ubuntu.out
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
 * Starting Hadoop namenode: 
namenode running as process 15437. Stop it first.
 * Starting Hadoop secondarynamenode: 
secondarynamenode running as process 3061. Stop it first.
naveensrikanthd@ubuntu:/$ jps
7467 RunJar
8048 RunJar
18363 Jps

No Hadoop process is running and this three statements given above[slf4J] are shuffling between namenode,datanode:

Below given is the log file for the path: /var/log/hadoop-hdfs/hadoop-hdfs-datanode-ubuntu.out SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation ulimit -a for user hdfs

What should I do to rid of this error anyone please help in crossing this error


Answer:

The output shows that in fact the namenodes are already running. You should double-check where you think they are supposed to run and what your config says, because it's saying you already succeeded.

The error from log4j has nothing to do with Hadoop functionality.

Question:

Hi I'm triying execute any hadoop command like "hadoop fs -ls" throught a Java app remotely. I have my Java app in my local machine and Hadoop in a VM.

First I make a ssh connection and work. Also I can execute a linux command throught the java code it was working ,but hadoop commands are not working,it throws the following Error .Any idea to Execute hadoop commands?

this is my jsch program

package com.jsch.test;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Properties;

import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelExec;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;

public class Jschtest {

public static void main(String[] args){

String command="hadoop fs -ls /";
try{         
     String host = "192.168.3.197"; //IP address of the remote server
     String user = "user";        // Username of the remote server
     String password = "HDP123!";  // Password of the remote server

     JSch jsch = new JSch();
     Session session = jsch.getSession(user, host, 22);
     Properties config = new Properties();
     config.put("StrictHostKeyChecking", "no");
     session.setConfig(config);;
     session.setPassword(password);
     session.connect();

     Channel channel = session.openChannel("exec");
     ((ChannelExec)channel).setCommand(command);
     channel.setInputStream(null);
     ((ChannelExec)channel).setErrStream(System.err);

     InputStream input = channel.getInputStream();
     channel.connect();

     System.out.println("Channel Connected to machine " + host + " server    
with command: " + command ); 

     try{
         InputStreamReader inputReader = new InputStreamReader(input);
         BufferedReader bufferedReader = new BufferedReader(inputReader);
         String line = null;

         while((line = bufferedReader.readLine()) != null){
             System.out.println(line);
         }
         bufferedReader.close();
         inputReader.close();
     }catch(IOException ex){
         ex.printStackTrace();
     }

     channel.disconnect();
     session.disconnect();
 }catch(Exception ex){
     ex.printStackTrace();
 }




}
}

This is my Error Message

Channel Connected to machine 192.168.3.197 server with command: hadoop fs -ls /

bash: hadoop: command not found


Answer:

Open your bashsrc and add the Hadoop BIN folder path to the PATH variable in the file. Run source ~/.bashrc command.

Alternatively, you can make the following change to the command variable :

command = usr/local/hadoop/bin/hadoop fs -ls