导入数据到hive外部表

导入数据到hive外部表；
1.hive命令创建外部表。
create EXTERNAL table applogsnew
(
applogid string,
msgtype string,
clienttype string,
userid bigint
)
PARTITIONED BY (create_time string)
row format delimited
fields terminated by '\t'
stored as textfile
location '/data/sda/apache-hive-1.2.1-bin/tmp/warehouse/applogsnew';
2.通过hadoop命令创建目录，上传文件到对应的目录。
hadoop fs -mkdir /data/sda/apache-hive-1.2.1-bin/tmp/warehouse/applogsnew/create_time=20160531
hadoop fs -put /home/appadmin/web/000000_0 /data/sda/apache-hive-1.2.1-bin/tmp/warehouse/applogsnew/create_time=20160531/

3.通过hive命令关联数据到分区
alter table applogsnew add partition (create_time='20160531') location '/data/sda/apache-hive-1.2.1-bin/tmp/warehouse/applogsnew/create_time=20160531';

Hive中如何快速的复制一张分区表（包括数据）
如果我们表的分区创建非常多的话，对于我们装载数据是一件非常麻烦的事，Hive提供动态分区来解决这个问题。
可以基于查询参数推断出需要创建的分区名称，相比的分区都是静态的，这里就称之为动态的分区。
怎么来弄呢？

首先复制表结构：

create table applogs like applogsnew;

然后执行插入：

INSERT overwrite TABLE applogsnew PARTITION(create_time)
SELECT applogid ,msgtype ,clienttype ,create_time FROM applogs;

报错了，需要我们开启动态分区的支持

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrict;
set hive.exec.max.dynamic.partitions.pernode=1000;

再次执行

INSERT overwrite TABLE applogsnew PARTITION(create_time)
SELECT applogid ,msgtype ,clienttype ,create_time FROM applogs;

备注：
在创建分区的时候，最好不要创建过多的分区，如果分区过多的话，查询也是非常的慢的，就像在window下一个文件夹下面的文件过多会对我们的使用造成非常的不便的。
那么hive能支持多大的分区数呢，可以使用命令set hive.exec.max.dynamic.partitions获取。

导入数据到hive外部表

猜你喜欢