Sqoop 使用详解

Sqoop 概述

Sqoop 是Apache 旗下的一款开源工具,用于Hadoop与关系型数据库之间传送数据,其核心功能有两个:导入数据和导出数据。导入数据是指将MySQL、Oracle等关系型数据库导入Hadoop的HDFS、Hive、HBase等数据存储系统;导出数据是指将Hadoop文件系统中的数据导出到MySQL、Oracle等关系型数据库。Sqoop 本质是一个命令行工具,与HDFS、Hive、MySQL经常一起使用。

sqoop 工作机制


翻译出的 mapreduce中主要是对 inputformatoutputformat进行定制。

Sqoop 安装

将sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz,上传到CentOS-7的/usr/local 目录下.

温馨提示:sqoop-1.4.6 兼容Hadoop 2.6 及其以上版本。


使用cd 命令切换至/usr/local 目录,然后使用tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 解压文件。

[root@Hadoop3-master ~]# cd /usr/local
[root@Hadoop3-master local]# tar -xvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

使用mv 命令重命名解压文件sqoop-1.4.7.bin__hadoop-2.6.0 为sqoop

[root@Hadoop3-master local]# mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop


配置对应的环境变量,在 /etc/profile 添加如下 内容:

[root@Hadoop3-master local]# cat /etc/profile
# /etc/profile
export JAVA_HOME=/usr/local/jdk
export HADOOP_HOME=/usr/local/hadoop
export SQOOP_HOME=/usr/local/sqoop

环境变量修改后使用source 命令使配置环境变量生效

[root@Hadoop3-master local]# source /etc/profile


拷贝MySQL-8 的jdbc 驱动至Sqoop的lib 目录

[root@Hadoop3-master local]# cp mysql-connector-java-8.0.12.jar /usr/local/sqoop/lib

修改Sqoop 配置文件

使用cd 命令切换至/usr/local/sqoop/config 目录,基于sqoop-env-template.sh 配置脚本模板创建sqoop-env.sh 配置脚本。

[root@Hadoop3-master local]# cd /usr/local/sqoop/conf
[root@Hadoop3-master conf]# ll
总用量 28
-rw-rw-r-- 1 1000 1000 3895 12月 19 2017 oraoop-site-template.xml
-rw-rw-r-- 1 1000 1000 1404 12月 19 2017 sqoop-env-template.cmd
-rwxr-xr-x 1 1000 1000 1345 12月 19 2017 sqoop-env-template.sh
-rw-rw-r-- 1 1000 1000 6044 12月 19 2017 sqoop-site-template.xml
-rw-rw-r-- 1 1000 1000 6044 12月 19 2017 sqoop-site.xml
[root@Hadoop3-master conf]# mv sqoop-env-template.sh sqoop-env.sh
[root@Hadoop3-master conf]# ll
总用量 28
-rw-rw-r-- 1 1000 1000 3895 12月 19 2017 oraoop-site-template.xml
-rwxr-xr-x 1 1000 1000 1345 12月 19 2017 sqoop-env.sh
-rw-rw-r-- 1 1000 1000 1404 12月 19 2017 sqoop-env-template.cmd
-rw-rw-r-- 1 1000 1000 6044 12月 19 2017 sqoop-site-template.xml
-rw-rw-r-- 1 1000 1000 6044 12月 19 2017 sqoop-site.xml

打开sqoop-env.sh并编辑下面几行:(温馨提示:先配置Hadoop 安装目录地址)

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/hadoop/

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/hadoop/

启动Sqoop 工具

不带任何参数启动Sqoop 是没有任何意义的。我们可以使用sqoop version查看Sqoop 的版本信息。

[root@Hadoop3-master local]# sqoop version
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2023-02-12 14:44:49,745 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

Sqoop 导入数据

Sqoop导入:导入单个表从RDBMSHDFS,表中的每一行被视为HDFS的记录,所有记录都存储为文本文件的文本数据(或者Avrosequence文件等二进制数据) 。


$ sqoop import (generic-args) (import-args) 

实战:将MySQL数据库中的用户表(base_house) 全表导入HDFS 中。

MySQL 表结构和初始化数据

-- ----------------------------
-- Table structure for `base_house`
-- ----------------------------
DROP TABLE IF EXISTS `base_house`;
CREATE TABLE `base_house` (
  `id` varchar(64) NOT NULL,
  `project_no` varchar(128) DEFAULT NULL,
  `project_name` varchar(256) DEFAULT NULL,
  `project_address` varchar(256) DEFAULT NULL,
  PRIMARY KEY (`id`)

-- ----------------------------
-- Records of base_house
-- ----------------------------
INSERT INTO `base_house` VALUES ('1', '20230301', '龙岗区安居房', '深圳市龙岗区布吉街道1120号');
INSERT INTO `base_house` VALUES ('2', '20230302', '罗湖区安居房', '深圳市罗湖区黄贝岭街道1100号');



 sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --target-dir '/sqoop/base-house' --fields-terminated-by ',' -m 1;


2023-03-01 15:23:13,302 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-root/mapred/local/job_local755394790_0001_58e4f0dd-e37b-41eb-9676-cf81b9c11936/libjars /usr/local/sqoop/bin/libjars/*' failed 1 with: ln: 无法创建符号链接"/usr/local/sqoop/bin/libjars/*": 没有那个文件或目录

# 查看指定文件夹详细信息
[root@Hadoop3-master bin]# hadoop fs -ls -R /sqoop/base-house
# 查看指定文件内容
[root@Hadoop3-master bin]# hadoop fs -cat /sqoop/base-house/part-m-00000
[root@Hadoop3-master compile]# cd /usr/local/hadoop/bin
[root@Hadoop3-master bin]# hadoop fs -ls -R /sqoop/base-house
-rw-r--r--   3 root supergroup          0 2023-03-01 15:23 /sqoop/base-house/_SUCCESS
-rw-r--r--   3 root supergroup        139 2023-03-01 15:23 /sqoop/base-house/part-m-00000
[root@Hadoop3-master bin]# hadoop fs -cat /sqoop/base-house/part-m-00000

实战:将MySQL数据库中的用户表(base_house) 条件导入HDFS 中。

向base_house 表中新增一条记录

INSERT INTO `base_house` VALUES ('3', '20230301', '南山区安居房', '深圳市南山区高新科技园1001号');



sqoop import --connect "jdbc:mysql://" --username root --password 123456  --target-dir '/sqoop/base-house-query' --fields-terminated-by ',' -m 1 --query 'select * from base_house where id=3 and $CONDITIONS';


2023-03-01 16:23:36,559 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-root/mapred/local/job_local791858248_0001_84c3a417-8c91-4c3c-a754-2792257a32fd/libjars /usr/local/sqoop/libjars/*' failed 1 with: ln: 无法创建符号链接"/usr/local/sqoop/libjars/*": 没有那个文件或目录

温馨提示:where语句中必须有 $CONDITIONS,表示将查询结果带回。


# 查看指定文件夹详细信息
[root@Hadoop3-master bin]# hadoop fs -ls -R /sqoop/base-house-query
# 查看指定文件内容
[root@Hadoop3-master bin]# hadoop fs -cat /sqoop/base-house-query/part-m-00000
[root@Hadoop3-master sqoop]# hadoop fs -ls -R /sqoop/base-house-query
-rw-r--r--   3 root supergroup          0 2023-03-01 16:23 /sqoop/base-house-query/_SUCCESS
-rw-r--r--   3 root supergroup         71 2023-03-01 16:23 /sqoop/base-house-query/part-m-00000
[root@Hadoop3-master sqoop]# hadoop fs -cat /sqoop/base-house-query/part-m-0000
cat: `/sqoop/base-house-query/part-m-0000': No such file or directory
[root@Hadoop3-master sqoop]# hadoop fs -cat /sqoop/base-house-query/part-m-00000


sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --target-dir '/sqoop/base-house' --fields-terminated-by ',' -m 1;

sqoop import --connect "jdbc:mysql://" --username root --password 123456  --target-dir '/sqoop/base-house-query' --fields-terminated-by ',' -m 1 --query 'select * from base_house where id=3 and $CONDITIONS';

条件导入移除 --table 属性配置
条件导入新增 --query 查询SQL

实战:将MySQL数据库中的用户表(base_house) 指定字段导入HDFS 中。


sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --columns id,project_no,project_name --target-dir '/sqoop/base-house-column' --fields-terminated-by ',' -m 1;


2023-03-01 17:17:00,916 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-root/mapred/local/job_local620808761_0001_aa086058-7939-434b-a562-64ce6ca45f63/libjars /usr/local/sqoop/bin/libjars/*' failed 1 with: ln: 无法创建符号链接"/usr/local/sqoop/bin/libjars/*": 没有那个文件或目录

# 查看指定文件夹详细信息
[root@Hadoop3-master bin]# hadoop fs -ls -R /sqoop/base-house-column
# 查看指定文件内容
[root@Hadoop3-master bin]# hadoop fs -cat /sqoop/base-house-column/part-m-00000
[root@Hadoop3-master bin]# hadoop fs -cat /sqoop/base-house-column/part-m-00000


sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --target-dir '/sqoop/base-house' --fields-terminated-by ',' -m 1;

sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --columns id,project_no,project_name --target-dir '/sqoop/base-house-column' --fields-terminated-by ',' -m 1;

指定字段导入新增 --columns 查询指定字段

实战:将MySQL数据库中的用户表(base_house) 通过Where 条件筛选记录导入HDFS 中。


 sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --where "id =3" \ --target-dir '/sqoop/base-house-where' --fields-terminated-by ',' -m 1;



sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --target-dir '/sqoop/base-house' --fields-terminated-by ',' -m 1;

sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --where "id =3" \ --target-dir '/sqoop/base-house-where' --fields-terminated-by ',' -m 1;
where导入新增 --where 查询条件

实战:将MySQL数据库中的用户表(base_house) 通过append增量导入HDFS 中。

下面的命令用于从MySQL数据库服务器通过append 增量方式导入HDFS

sqoop import --connect "jdbc:mysql://" --username root --password 123456  --target-dir '/sqoop/base-house-append' --fields-terminated-by ',' --query 'select * from base_house where $CONDITIONS' --split-by id -m 2 --incremental append --check-column id --last-value 0;


--split-by 和 -m 结合实现numberReduceTasks并行。

--check-column id 和--last-value 0 结合实现类似where id > 0 的查询效果




--incremental append # 模式

--check-column id --last-value 0 #查询条件

实战:将MySQL数据库中的用户表(base_house) 通过lastmodified增量导入HDFS 中。

下面的命令用于从MySQL数据库服务器通过lastmodified 增量方式导入HDFS

sqoop import --connect "jdbc:mysql://" --username root --password 123456  --target-dir '/sqoop/base-house-append' --fields-terminated-by ',' --query 'select * from base_house where $CONDITIONS' --split-by id -m 2 --incremental lastmodified --check-column id --last-value 2;


--incremental 增量模式(lastmodified/append)。

--check-column id --last-value 2 结合实现类似where id > 2 的查询效果




--incremental lastmodified # 模式

--check-column id --last-value 2 #查询条件


第一步:在HBase 先创建namespace,名称为:house

hbase(main):001:0> list_namespace
3 row(s)
Took 0.9580 seconds
hbase(main):002:0> create_namespace "house"
Took 0.2816 seconds

第二步:创建base_house 表,同时指定namespace 为house

hbase(main):006:0* create 'house:base_house', 'projectinfo'
Created table house:base_house
Took 1.4626 seconds


 sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --hbase-table house:base_house --column-family projectinfo --hbase-create-table --hbase-row-key id


–column-family projectinfo




–hbase-row-key id



scan 'house:base_house'


第一步:创建house 数据库

hive> show databases;
Time taken: 1.565 seconds, Fetched: 2 row(s)
hive> create database house
    > ;
Time taken: 0.416 seconds
hive> show databases;
Time taken: 0.059 seconds, Fetched: 3 row(s)

第二步:创建base_house 表

hive> create table base_house(id string, project_no string, project_name string, project_address string) row format delimited fields terminated by '\t';
Time taken: 1.379 seconds
hive> show tables;


sqoop import --connect "jdbc:mysql://" --username root --password 123456 --table base_house --hive-import --hive-database house --create-hive-table --hive-table base_house --hive-overwrite -m 3


2023-03-02 16:07:50,260 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 12 more



-- 拷贝HiveConfig 类依赖jar 包
cp /usr/local/hive/lib/hive-common-3.1.2.jar /usr/local/sqoop/lib/


--hive-import :导入Hive

--hive-database:导入Hive 数据库

--create-hive-table --hive-table:导入Hive指定表名,如果不存在,直接 创建

--hive-overwrite: 是否覆盖

第四步:查看Hive 中"base.base_house"表数据

hive> show databases;
Time taken: 0.925 seconds, Fetched: 3 row(s)
hive> use house;
Time taken: 0.091 seconds
hive> show tables;
Time taken: 0.082 seconds, Fetched: 1 row(s)
hive> select * from base_house;

Sqoop 导出数据



cd /usr/local/tmp # 切换数据临时目录
vi emp_data #编辑emp_data 同步数据


[root@Hadoop3-master tmp]# cat /usr/local/tmp/emp_data
1201, gopal,     manager, 50000, TP
1202, manisha,   preader, 50000, TP
1203, kalil,     php dev, 30000, AC
1204, prasanth,  php dev, 30000, AC
1205, kranthi,   admin,   20000, TP
1206, satish p,  grp des, 20000, GR

温馨提示:上面提供的数据内容会提示NumberException 异常,正确内容如下:

[root@Hadoop3-master tmp]# cat /usr/local/tmp/emp_data
1201, gopal,     manager,500, TP
1202, manisha,   preader,50, TP
1203, kalil,     php dev,300, AC
1204, prasanth,  php dev,300, AC
1205, kranthi,   admin,1, TP
1206, satish p,  grp des,2, GR


 hdfs dfs -put /usr/local/tmp/emp_data  /tmp #上传Hadoop 临时目录
[root@Hadoop3-master tmp]# hdfs dfs -ls /tmp/emp_data #查看上传文件目录
-rw-r--r--   3 root supergroup        216 2023-03-02 17:05 /tmp/emp_data


CREATE TABLE employee ( 
   name VARCHAR(20), 
   deg VARCHAR(20),
   salary INT,
   dept VARCHAR(10));


sqoop export --connect "jdbc:mysql://" --username root --password 123456 --table employee  --input-fields-terminated-by ',' --export-dir  /tmp/emp_data


2023-03-02 17:26:54,083 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-root/mapred/local/job_local1972622522_0001_25c51b6b-5b27-4a9e-b597-a3f7c9f0aa83/libjars /usr/local/tmp/libjars/*' failed 1 with: ln: 无法创建符号链接"/usr/local/tmp/libjars/*": 没有那个文件或目录

MySQL 查询:

Hive/Hbase 导入MySQL 8

第一步:在MySQL 8 创建hiveTomysql, 建表语句如下:

create table hiveTomysql(
    sid int primary key,
    sname varchar(5) not null,
    gender varchar(1) default '男',
    age int not null

第二步:在Hive 中新建hiveTomysql 表,并插入相关数据

[root@Hadoop3-master tmp]# cd /usr/local/hive/bin
[root@Hadoop3-master bin]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 4552807f-c6fc-4ebb-86a6-1cd8c3b013ec

选择default 数据库,并创建hiveTomysql 表

hive> show databases;
Time taken: 1.026 seconds, Fetched: 3 row(s)
hive> use default;
Time taken: 0.097 seconds
hive> show tables;
Time taken: 0.085 seconds, Fetched: 4 row(s)
hive> CREATE TABLE IF NOT EXISTS hiveTomysql(sid INT,sname string,gender string, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Time taken: 1.371 seconds
hive> show tables;
Time taken: 0.126 seconds, Fetched: 5 row(s)

第三步:加载本地数据至default.hiveTomysql 表

在/usr/local/tmp/目录下,创建hiveTomysql 数据文件,文件内容如下:

[root@Hadoop3-master bin]# cat /usr/local/tmp/hiveTomysql
1, sun, 女, 15
2, man, 男, 30

在Hive Shell窗口,将数据上传,执行如下指令:

hive> load data local inpath '/usr/local/tmp/hiveTomysql' overwrite into table d                                                                efault.hiveTomysql;
Loading data to table default.hivetomysql
Time taken: 2.962 seconds
hive> show databases;
Time taken: 0.196 seconds, Fetched: 3 row(s)
hive> use def
default   defined
hive> use default;
Time taken: 0.086 seconds
hive> show tables;
Time taken: 0.073 seconds, Fetched: 5 row(s)
hive> select *  from hiveTomysql;
1        sun     女     NULL
2        man     男     NULL
Time taken: 2.947 seconds, Fetched: 2 row(s)

查看Hive 中default.hiveTomysql 对应Hadoop 文件存储路径地址

hive> desc formatted default.hiveTomysql
    > ;
# col_name              data_type               comment
sid                     int
sname                   string
gender                  string
age                     int

# Detailed Table Information
Database:               default
OwnerType:              USER
Owner:                  root
CreateTime:             Thu Mar 02 17:42:49 CST 2023
LastAccessTime:         UNKNOWN
Retention:              0
Location:               hdfs://Hadoop3-master:9000/user/hive/warehouse/hivetomysql
Table Type:             MANAGED_TABLE
Table Parameters:
        bucketing_version       2
        numFiles                1
        numRows                 0
        rawDataSize             0
        totalSize               32
        transient_lastDdlTime   1677750562

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ,
        serialization.format    ,
Time taken: 1.534 seconds, Fetched: 34 row(s)

通过上面信息可以得知:hiveTomysql 对应Hadoop存储路径地址:/user/hive/warehouse/hivetomysql


 sqoop export --connect "jdbc:mysql://" --username root --password 123456 --table hiveTomysql  --input-fields-terminated-by ',' --export-dir /user/hive/warehouse/hivetomysql


2023-03-02 18:39:04,938 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-root/mapred/local/job_local1924785693_0001_a259a15d-5e77-4451-a2d6-8c30b49acfa6/libjars /usr/local/sqoop/bin/libjars/*' failed 1 with: ln: 无法创建符号链接"/usr/local/sqoop/bin/libjars/*": 没有那个文件或目录

MySQL 结果展示

Sqoop 深入理解



Sqoop 代码定制




$ sqoop-codegen (generic-args) (codegen-args) 
$ sqoop-codegen (generic-args) (codegen-args)



$ sqoop-codegen \
--connect jdbc:mysql://localhost/userdb \
--username root \ 
--table emp


14/12/23 02:34:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/23 02:34:41 INFO tool.CodeGenTool: Beginning code generation
14/12/23 02:34:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/12/23 02:34:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.jar

验证: 查看输出目录下的文件

$ cd /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/
$ ls






Datax 参考资料:Datax 一文读懂

