sqoop创建并导入数据到hive orc表
sqoop import \ --connect jdbc:mysql://localhost:3306/spider \ --username root --password 1234qwer \ --table org_ic_track --driver com.mysql.jdbc.Driver \ --create-hcatalog-table \ --hcatalog-database spider_tmp \ --hcatalog-table org_ic_track \ --hcatalog-partition-keys batch \ --hcatalog-partition-values 20190404 \ --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \ -m 1
查看表结构
CREATE TABLE `org_ic_track`( `id` int, `info_id` int, `company` varchar(250), `company_url` varchar(250), `invest_date` varchar(150), `invested_company` varchar(500), `invested_ratio` varchar(100), `update_time` string) PARTITIONED BY ( `batch` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hadoop1:8020/home/hive/warehouse/spider_tmp.db/org_ic_track' TBLPROPERTIES ( 'orc.compress'='SNAPPY', 'transient_lastDdlTime'='1554342988')
sqoop导入数据到已存在的hive orc表
sqoop import \ --connect jdbc:mysql://localhost:3306/spider \ --username root --password 1234qwer \ --table org_ic_track --driver com.mysql.jdbc.Driver \ --hcatalog-database spider_tmp \ --hcatalog-table org_ic_track \ --hcatalog-partition-keys batch \ --hcatalog-partition-values 20190405 \ -m 1
sqoop导入数据(query)到已存在的hive orc表
sqoop import \ --connect jdbc:mysql://localhost:3306/spider \ --username root --password 1234qwer \ --query "select * from org_ic_track where update_time between '2019-04-01 21:16:04' and '2019-04-01 21:16:05' and \$CONDITIONS" \ --driver com.mysql.jdbc.Driver \ --hcatalog-database spider_tmp \ --hcatalog-table org_ic_track \ --hcatalog-partition-keys batch \ --hcatalog-partition-values 20190406 \ -m 1
字段说明
connect JDBC连接信息 username JDBC验证用户名 password JDBC验证密码 table 要导入的源表名 driver 指定JDBC驱动 create-hcatalog-table 指定需要创建表,若不指定则默认不创建,注意若指定创建的表已存在将会报错 hcatalog-database 目标库 hcatalog-table 目标表名 hcatalog-storage-stanza 指定存储格式,该参数值会拼接到create table的命令中。默认:stored as rcfile hcatalog-partition-keys 指定分区字段,多个字段请用逗号隔开(hive-partition-key的加强版) hcatalog-partition-values 指定分区值,多分区值请用逗号隔开(hive-partition-value的加强)