Hadoop reduce阶段出现Failed to fetch错误及解决

http://wiki.apache.org/hadoop/ConnectionRefused

Connection Refused

You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.

Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn't running.

  1. Check the hostname the client using is correct
  2. Check the IP address the client gets for the hostname is correct.
  3. Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts ( is notorious for this)
  4. Check the port the client is using matches that the server is offering a service on.
  5. On the server, try a telnet localhost <port> to see if the port is open there.

  6. On the client, try a telnet <server> <port> to see if the port is accessible remotely.

  7. Try connecting to the server/port from a different machine, to see if it just the single client misbehaving.

None of these are Hadoop problems, they are host, network and firewall configuration issues. As it is your cluster, only you can find out and track down the problem.

其中很关键的一条: 一定要将节点的hostname与其在hadoop配置中的IP地址(或域名, 在slaves或master文件中)绑定。例如:

192.168.1.101  hadoop01

另如果节点hostname未曾更改过,hosts文件会有hostname与127.0.,0,1的绑定:

127.0.0.1  localhost localhost.localdomain

用hostname命令查看本机域名,可能是localhost.localdomain或localhost,需要将其屏蔽掉。值此问题解决,任务可以正常执行。但是还是无法在hostname:50070上查看hdfs上的文件(Browse the filesystem 打不开)。

猜你喜欢

转载自www.linuxidc.com/Linux/2017-11/148344.htm