目录
简单说明
-
下载hadoop distribution
有三个包: (区别是啥?)- hadoop-x.y.z-site.tar.gz
- hadoop-x.y.z-src.tar.gz
- hadoop-x.y.z.tar.gz
-
hadoop由不同的组件组成,不同组件有不同的daemon,每个daemon是独立的java process;配置daemon的启动参数,是通过环境变量实现
-
HDFS
在 etc/hadoop/hadoop-evn.sh 中配置- NameNode daemon: HDFS_NAMENODE_OPTS
- DataNode daemon: HDFS_DATANODE_OPTS
- Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
-
YARN
在 etc/hadoop/yarn-evn.sh 中配置- ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
- NodeManager daemon: YARN_NODEMANAGER_OPTS
- WebAppProxy daemon: YARN_PROXYSERVER_OPTS
-
MapReduce
在 etc/hadoop/mapred-evn.sh 中配置- MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
-
-
hadoop全局配置,在系统文件(~/.bashrc)中配置
- HADOOP_HOME: hadoop distribution的家目录,至少要配置
- HADOOP_PID_DIR
- HADOOP_LOG_DIR
- HADOOP_HEAPSIZE_MAX
重要的配置参数及配置选择
-
所有节点都要配置
-
etc/hadoop/core-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml-
fs.defaultFS
配置HDFS中NameNode的URI
- io.file.buffer.size
-
-
-
NameNode节点配置
-
etc/hadoop/hdfs-site.xml
示例配置位置:./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml- dfs.namenode.name.dir
- dfs.hosts / dfs.hosts/excluded
- dfs.blocksize
- dfs.namenode.handler.count
-
-
DataNode节点配置
-
etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
-
部署实践,参数配置修改记录
local machine, NameNode
-
system环境变量
export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"
-
etc/hadoop/core-size.xml
-
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</property>
-
io.file.buffer.size
<property>
io.file.buffer.size
4096
The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</property>
-
-
etc/hadoop/hdfs-site.xml
-
dfs.namenode.name.dir
<property> <name>dfs.namenode.name.dir</name> <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value> <description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.</property>
-
local machine, DataNode
-
etc/hadoop/hdfs-site.xml
-
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/jng/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.</property>
-
192.168.1.101, DataNode
-
system环境变量
export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0" export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir" export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"
-
etc/hadoop/core-site.html
-
fs.defaultFS
<property>
fs.defaultFS
hdfs://195.90.3.212:9988/
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</property>
-
io.file.buffer.size
<property>
io.file.buffer.size
4096
The size of buffer for use in sequence files.
The size of this buffer should probably be a multiple of hardware
page size (4096 on Intel x86), and it determines how much data is
buffered during read and write operations.</property>
-
-
etc/hadoop/hdfs-site.xml
-
dfs.datanode.data.dir
<property>
dfs.datanode.data.dir
file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices. The directories should be tagged
with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
storage policies. The default storage type will be DISK if the directory does
not have a storage type tagged explicitly. Directories that do not exist will
be created if local filesystem permission allows.</property>
-
启动HDFS cluster
-
首次启动HDFS cluster必须进行格式化
# 在namenode设备上执行(??) $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>
-
启动NameNode
# 在namenode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon start namenode
-
启动DataNode
$ $HADOOP_HOME/bin/hdfs --daemon start datanode
-
可选一键启动
# 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕 $ $HADOOP_HOME/sbin/start-dfs.sh
启动验证
- 查看NameNode的web ui: http://ip:port default port is: 9870
- 查看DataNode的web ui: http://ip:port default port is: 9864
hdfs shell创建文件
从本地copy大尺寸文件到hdfs中,查看namenode、datanode的数据存储文件夹大小变化
-
copy文件前
-
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
-
local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 44K . [j@j dfs_datanode_data_dir]$
-
192.168.1.101 DataNode的dfs.datanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 44K . m@m:~/installed/hadoop/dfs_datanode_data_dir$
-
-
copy文件
# 在NameNode上操作 [j@j hadoop-3.2.0]$ pwd /home/jng/installed/hadoop/hadoop-3.2.0 [j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz -rw-r--r-- 1 jng jng 330M 2月 25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz [j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/
-
copy文件后
-
local machine as NameNode的dfs.namenode.name.dir路径
[j@j dfs_namenode_name_dir]$ pwd /home/jng/installed/hadoop/dfs_namenode_name_dir [j@j dfs_namenode_name_dir]$ du -hs 2.1M . [j@j dfs_namenode_name_dir]$
-
local machine as DataNode的dfs.datanode.data.dir路径
[j@j dfs_datanode_data_dir]$ pwd /home/jng/installed/hadoop/dfs_datanode_data_dir [j@j dfs_datanode_data_dir]$ du -hs 333M . [j@j dfs_datanode_data_dir]$
-
192.168.1.101 as DataNode的dfs.dataanode.data.dir路径
m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd /home/mhb/installed/hadoop/dfs_datanode_data_dir m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs 333M . m@m:~/installed/hadoop/dfs_datanode_data_dir$
-
问题与解决
-
NameNode web UI 上查看 namenode-log 可能发现 WARN 形如:“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“
ref: <https://blog.csdn.net/qqpy789/article/details/78189335> 修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> <description>
If true (the default), then the namenode requires that a connecting
datanode's address must be resolved to a hostname. If necessary, a reverse
DNS lookup is performed. All attempts to register a datanode from an
unresolvable address are rejected.It is recommended that this setting be left on to prevent accidental
registration of datanodes listed by hostname in the excludes file during a
DNS outage. Only set this to false in environments where there is no
infrastructure to support reverse DNS lookup.</description> </property>
关闭HDFS cluster
-
关闭NameNode
# 在NameNode设备上执行 $ $HADOOP_HOME/bin/hdfs --daemon stop namenode
-
关闭DataNode
$ $HADOOP_HOME/bin/hdfs --daemon stop datanode
结论
-
HDFS可以独立与YARN存在并运行
即,不启动YARN,HDFS也能正常运行,至少通过HDFS shell是这样
- HDFS的NameNode设备上可以同时运行一个DataNode