简单说明

下载hadoop distribution
有三个包： （区别是啥？）
1. hadoop-x.y.z-site.tar.gz
2. hadoop-x.y.z-src.tar.gz
3. hadoop-x.y.z.tar.gz
hadoop由不同的组件组成，不同组件有不同的daemon，每个daemon是独立的java process；配置daemon的启动参数，是通过环境变量实现
- HDFS
  在 etc/hadoop/hadoop-evn.sh 中配置
  - NameNode daemon: HDFS_NAMENODE_OPTS
  - DataNode daemon: HDFS_DATANODE_OPTS
  - Secondary NameNode daemon: HDFS_SECONDARYNAMENODE_OPTS
- YARN
  在 etc/hadoop/yarn-evn.sh 中配置
  - ResourceManager daemon: YARN_RESOURCEMANAGER_OPTS
  - NodeManager daemon: YARN_NODEMANAGER_OPTS
  - WebAppProxy daemon: YARN_PROXYSERVER_OPTS
- MapReduce
  在 etc/hadoop/mapred-evn.sh 中配置
  - MAP Reduce Job History Server daemon: MAPRED_HISTORYSERVER_OPTS
hadoop全局配置，在系统文件（~/.bashrc）中配置
- HADOOP_HOME: hadoop distribution的家目录，至少要配置
- HADOOP_PID_DIR
- HADOOP_LOG_DIR
- HADOOP_HEAPSIZE_MAX

重要的配置参数及配置选择

所有节点都要配置
- etc/hadoop/core-site.xml
  示例配置位置：./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
  - fs.defaultFS
```
配置HDFS中NameNode的URI
```
  - io.file.buffer.size
NameNode节点配置
- etc/hadoop/hdfs-site.xml
  示例配置位置：./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
  - dfs.namenode.name.dir
  - dfs.hosts / dfs.hosts/excluded
  - dfs.blocksize
  - dfs.namenode.handler.count
DataNode节点配置
- etc/hadoop/hdfs-site.xml
  - dfs.datanode.data.dir

部署实践，参数配置修改记录

local machine, NameNode

system环境变量

export HADOOP_HOME="/home/jng/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/jng/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/jng/installed/hadoop/hadoop_log_dir"

etc/hadoop/core-size.xml
- fs.defaultFS
```
    <property>
```
  fs.defaultFS
  hdfs://195.90.3.212:9988/
  The name of the default file system. A URI whose
  scheme and authority determine the FileSystem implementation. The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class. The uri's authority is used to
  determine the host, port, etc. for a filesystem.
```
    </property>
```
- io.file.buffer.size
```
    <property>
```
  io.file.buffer.size
  4096
  The size of buffer for use in sequence files.
  The size of this buffer should probably be a multiple of hardware
  page size (4096 on Intel x86), and it determines how much data is
  buffered during read and write operations.
```
    </property>
```

etc/hadoop/hdfs-site.xml

dfs.namenode.name.dir


    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:///home/jng/installed/hadoop/dfs_namenode_name_dir</value>
      <description>Determines where on the local filesystem the DFS name node

should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.

    </property>

local machine, DataNode

etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
```
    <property>
```
  dfs.datanode.data.dir
  file:///home/jng/installed/hadoop/dfs_datanode_data_dir
  Determines where on the local filesystem an DFS data node
  should store its blocks. If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
```
    </property>
```

192.168.1.101, DataNode

system环境变量

export HADOOP_HOME="/home/mhb/installed/hadoop/hadoop-3.2.0"
export HADOOP_PID_DIR="/home/mhb/installed/hadoop/hadoop_pid_dir"
export HADOOP_LOG_DIR="/home/mhb/installed/hadoop/hadoop_log_dir"

etc/hadoop/core-site.html
- fs.defaultFS
```
    <property>
```
  fs.defaultFS
  hdfs://195.90.3.212:9988/
  The name of the default file system. A URI whose
  scheme and authority determine the FileSystem implementation. The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class. The uri's authority is used to
  determine the host, port, etc. for a filesystem.
```
    </property>
```
- io.file.buffer.size
```
    <property>
```
  io.file.buffer.size
  4096
  The size of buffer for use in sequence files.
  The size of this buffer should probably be a multiple of hardware
  page size (4096 on Intel x86), and it determines how much data is
  buffered during read and write operations.
```
    </property>
```
etc/hadoop/hdfs-site.xml
- dfs.datanode.data.dir
```
    <property>
```
  dfs.datanode.data.dir
  file:///home/mhb/installed/hadoop/dfs_datanode_data_dir
  Determines where on the local filesystem an DFS data node
  should store its blocks. If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices. The directories should be tagged
  with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS
  storage policies. The default storage type will be DISK if the directory does
  not have a storage type tagged explicitly. Directories that do not exist will
  be created if local filesystem permission allows.
```
    </property>
```

启动HDFS cluster

首次启动HDFS cluster必须进行格式化


    # 在namenode设备上执行（？？）
    $ $HADOOP_HOME/bin/hdfs namenode -format <cluster_name>

启动NameNode


    # 在namenode设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon start namenode

启动DataNode


    $ $HADOOP_HOME/bin/hdfs --daemon start datanode

可选一键启动


    # 前提是同时满足: 1)etc/hadoop/workers文件被正确配置;2)NameNode设备与DataNode设备间无密码ssh访问已经配置完毕
    $ $HADOOP_HOME/sbin/start-dfs.sh

启动验证

查看NameNode的web ui: http://ip:port default port is: 9870
查看DataNode的web ui: http://ip:port default port is: 9864

hdfs shell创建文件

从本地copy大尺寸文件到hdfs中，查看namenode、datanode的数据存储文件夹大小变化

copy文件前

local machine as NameNode的dfs.namenode.name.dir路径


    [j@j dfs_namenode_name_dir]$ pwd
    /home/jng/installed/hadoop/dfs_namenode_name_dir
    [j@j dfs_namenode_name_dir]$ du -hs
    2.1M    .
    [j@j dfs_namenode_name_dir]$

local machine as DataNode的dfs.datanode.data.dir路径


    [j@j dfs_datanode_data_dir]$ pwd
    /home/jng/installed/hadoop/dfs_datanode_data_dir
    [j@j dfs_datanode_data_dir]$ du -hs
    44K    .
    [j@j dfs_datanode_data_dir]$

192.168.1.101 DataNode的dfs.datanode.data.dir路径


    m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
    /home/mhb/installed/hadoop/dfs_datanode_data_dir
    m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
    44K    .
    m@m:~/installed/hadoop/dfs_datanode_data_dir$

copy文件

# 在NameNode上操作
[j@j hadoop-3.2.0]$ pwd
/home/jng/installed/hadoop/hadoop-3.2.0
[j@j hadoop-3.2.0]$ ls -lh ~/software/hadoop/hadoop-3.2.0.tar.gz 
-rw-r--r-- 1 jng jng 330M 2月  25 14:21 /home/jng/software/hadoop/hadoop-3.2.0.tar.gz
[j@j hadoop-3.2.0]$ ./bin/hdfs dfs -moveFromLocal ~/software/hadoop/hadoop-3.2.0.tar.gz /tmp/

copy文件后

local machine as NameNode的dfs.namenode.name.dir路径


    [j@j dfs_namenode_name_dir]$ pwd
    /home/jng/installed/hadoop/dfs_namenode_name_dir
    [j@j dfs_namenode_name_dir]$ du -hs
    2.1M    .
    [j@j dfs_namenode_name_dir]$

local machine as DataNode的dfs.datanode.data.dir路径


    [j@j dfs_datanode_data_dir]$ pwd
    /home/jng/installed/hadoop/dfs_datanode_data_dir
    [j@j dfs_datanode_data_dir]$ du -hs
    333M    .
    [j@j dfs_datanode_data_dir]$

192.168.1.101 as DataNode的dfs.dataanode.data.dir路径


    m@m:~/installed/hadoop/dfs_datanode_data_dir$ pwd
    /home/mhb/installed/hadoop/dfs_datanode_data_dir
    m@m:~/installed/hadoop/dfs_datanode_data_dir$ du -hs
    333M    .
    m@m:~/installed/hadoop/dfs_datanode_data_dir$

问题与解决

NameNode web UI 上查看 namenode-log 可能发现 WARN 形如：“WARN org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Unresolved datanode registration: hostname cannot be resolved (ip=192.168.1.101, hostname=192.168.1.101)“
```
ref: <https://blog.csdn.net/qqpy789/article/details/78189335>  
修改文件 etc/hadoop/hdfs-site.xml 配置 dfs.namenode.datanode.registration.ip-hostname-check 取值为false  

    <property>
      <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
      <value>false</value>
      <description>
```
If true (the default), then the namenode requires that a connecting
datanode's address must be resolved to a hostname. If necessary, a reverse
DNS lookup is performed. All attempts to register a datanode from an
unresolvable address are rejected.

It is recommended that this setting be left on to prevent accidental
registration of datanodes listed by hostname in the excludes file during a
DNS outage. Only set this to false in environments where there is no
infrastructure to support reverse DNS lookup.
```
      </description>
    </property>
```

关闭HDFS cluster

关闭NameNode


    # 在NameNode设备上执行
    $ $HADOOP_HOME/bin/hdfs --daemon stop namenode

关闭DataNode


    $ $HADOOP_HOME/bin/hdfs --daemon stop datanode

结论

HDFS可以独立与YARN存在并运行

即，不启动YARN,HDFS也能正常运行，至少通过HDFS shell是这样

HDFS的NameNode设备上可以同时运行一个DataNode

HDFS部署体验

目录