任务背景
我们都知道, Apache Hadoop
需要使用许多依赖项,我们当然不希望花费大量的时间在项目配置上,最好的方式是能够开箱即用、快速编码生成 Apache Hadoop MapReduce
项目。如果能够有一个快速生成 Apache Hadoop MapReduce
项目的原型,岂不是非常方便?
注意事项
本原型适用于以下开发环境:
Java Development Kit 8
,其中OpenJDK
与Oracle JDK
都是可兼容的。Apache Hadoop v2.7.1
,暂不清楚其他版本的适用情况。
使用指导
安装原型
- 安装
Apache Maven
。 - 前往 GitHub Packages 下载
*.pom
和*.jar
。 - 将下载得到的
*.jar
与*.pom
放置于$LOCAL_REPO/io/github/dragon1573/hadoop-quickstart-archetype/1.0-mapr271-jdk8/
目录下。其中,$LOCAL_REPO
目录为Apache Maven
本地仓库地址,安装后默认为~/.m2/repository
。
- 修改
$LOCAL_REPO/archetype-catalog.xml
,将下载获得的Maven Archetype
严格按如下格式添加到目录中。
<?xml version="1.0" encoding="UTF-8" ?>
<archetype-catalog xsi:schemaLocation="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0 http://maven.apache.org/xsd/archetype-catalog-1.0.0.xsd" xmlns="http://maven.apache.org/plugins/maven-archetype-plugin/archetype-catalog/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<archetypes>
<!-- 添加以下内容 -->
<archetype>
<groupId>io.github.dragon1573</groupId>
<artifactId>hadoop-quickstart-archetype</artifactId>
<version>1.0-mapr271-jdk8</version>
<description>Immediately generate an Apache Hadoop MapReduce Job</description>
<repository>https://maven.pkg.github.com/Dragon1573/Maven-Hadoop</repository>
</archetype>
<!-- 添加以上内容 -->
</archetypes>
</archetype-catalog>
使用原型
- 在任意目录下输入如下命令,尝试使用原型创建项目(示例):
$ mvn -DarchetypeCatalog=local archetype:generate
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:3.1.2:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:3.1.2:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO]
[INFO] --- maven-archetype-plugin:3.1.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: local -> io.github.dragon1573:hadoop-quickstart-archetype (Immediately generate an Apache Hadoop MapReduce Job)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1
Define value for property 'groupId': com.example
Define value for property 'artifactId': hadoop-quickstart
Define value for property 'version' 1.0-SNAPSHOT: : 1.0
Define value for property 'package' com.example: : main
Confirm properties configuration:
groupId: com.example
artifactId: hadoop-quickstart
version: 1.0
package: main
Y: : Y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: hadoop-quickstart-archetype:1.0-mapr271-jdk8
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.example
[INFO] Parameter: artifactId, Value: hadoop-quickstart
[INFO] Parameter: version, Value: 1.0
[INFO] Parameter: package, Value: main
[INFO] Parameter: packageInPathFormat, Value: main
[INFO] Parameter: package, Value: main
[INFO] Parameter: groupId, Value: com.example
[INFO] Parameter: artifactId, Value: hadoop-quickstart
[INFO] Parameter: version, Value: 1.0
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\codeStyles\codeStyleConfig.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\codeStyles\Project.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\inspectionProfiles\Project_Default.xml
[WARNING] Don't override file D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\.idea\copyright\Apache_v2_0.xml
[INFO] Project created from Archetype in dir: D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35.209 s
[INFO] Finished at: 2020-03-28T15:26:57+08:00
[INFO] ------------------------------------------------------------------------
- 检查通过原型创建的项目目录结构,如果您能够正常生成具有以下目录结构的项目,则表示原型安装没有任何问题,您可以正常地使用原型了。
$ cd hadoop-quickstart/
legen@Legend1949 MINGW64 /Repos/hadoop-quickstart
$ tree
.
├── LICENSE
├── README.md
├── pom.xml
└── src
└── main
├── java
│ ├── main
│ │ └── DailyAccessCount.java
│ └── mapreduce
│ ├── MyMapper.java
│ └── MyReducer.java
└── resources
└── user_login.txt
6 directories, 7 files
关于原型
原型提供了一个简单的 Apache Hadoop MapReduce
项目——编程实现按日期统计访问次数,以下项目的任务描述:
- 本项目的设计目标是统计用户在2016年度每个自然日的总访问次数。
- 原始文件
src/main/resources/user_login.txt
中提供了用户名称与访问日期。- 本项目任务的是指是要获取以每个自然日为单位的所有用户访问次数的累加值。
- 在项目配置文件
pom.xml
的同级目录下,使用如下命令将项目打包生成*.jar
程序包(示例)。
$ mvn package
[INFO] Scanning for projects...
[INFO]
[INFO] -------------------< com.example:hadoop-quickstart >--------------------
[INFO] Building hadoop-quickstart 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hadoop-quickstart ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hadoop-quickstart ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 3 source files to D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\target\classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hadoop-quickstart ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\src\test\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-quickstart ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ hadoop-quickstart ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ hadoop-quickstart ---
[INFO] Building jar: D:\Program_Files_(x64)\Git\Repos\hadoop-quickstart\target\hadoop-quickstart-1.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.744 s
[INFO] Finished at: 2020-03-28T15:44:42+08:00
[INFO] ------------------------------------------------------------------------
- 将项目提供的
src/main/resources/user_login.txt
上传至Apache Hadoop HDFS
分布式文件系统,文件所在目录记为$INPUT_DIR
。 - 使用命令
hadoop jar target/hadoop-quickstart-1.0.jar main.DailyAccessCount $INPUT_DIR $OUTPUT_DIR
将MapReduce
任务程序包提交至Apache Hadoop
集群运行。其中,$OUTPUT_DIR
是MapReduce
任务完成后输出结果的Apache Hadoop HDFS
分布式文件系统目录。 - 使用命令
hdfs dfs -cat "$OUTPUT_DIR/* | head -n 15"
查看前15项排序结果。
后记
如果您在使用本原型的过程中遇到任何问题,欢迎在评论区或 Issues · Dragon1573/Maven-Hadoop 进行反馈,我会尽能力修复。
感谢你们下载、安装并使用本原型!