Hive---外部分区表的创建
(1)假设有个分区表,数据如下:
hive> show create table partition_parquet; OK CREATE TABLE `partition_parquet`( `member_id` string, `name` string, `add_item` string) PARTITIONED BY ( `stat_date` string, `province` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\t', 'serialization.format'='\t') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet' TBLPROPERTIES ( 'last_modified_by'='a6', 'last_modified_time'='1525229204', 'transient_lastDdlTime'='1525229204') Time taken: 0.173 seconds, Fetched: 22 row(s)
部分数据如下:
hive> SELECT * FROM partition_parquet where stat_date='20110527' and province ='liaoning'; OK 1 liujiannan NULL 20110527 liaoning 2 wangchaoqun NULL 20110527 liaoning 3 xuhongxing NULL 20110527 liaoning 4 zhudaoyong NULL 20110527 liaoning 5 zhouchengyu NULL 20110527 liaoning
存储目录如下;
bogon:bin a6$ hadoop dfs -ls -R hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/ DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 18/06/23 19:34:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable drwxr-xr-x - a6 supergroup 0 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning -rwxr-xr-x 1 a6 supergroup 437 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning/000000_0
(2)不好的例子——外部分区表的创建及数据导入
CREATE external TABLE `partition_external_parquet`( `member_id` string, `name` string, `add_item` string) PARTITIONED BY ( `stat_date` string, `province` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\t', 'serialization.format'='\t') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'但是此时表中数据并没有显示,如下:
hive> SELECT * FROM partition_external_parquet; OK Time taken: 1.695 seconds原因:没有加入分区
接下来我们加入分区.
hive> alter table partition_external_parquet add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'; OK Time taken: 0.836 seconds在此查看数据:
hive> SELECT * FROM partition_external_parquet; OK SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 1 liujiannan NULL 20110527 liaoning 2 wangchaoqun NULL 20110527 liaoning 3 xuhongxing NULL 20110527 liaoning 4 zhudaoyong NULL 20110527 liaoning 5 zhouchengyu NULL 20110527 liaoning Time taken: 1.474 seconds, Fetched: 5 row(s)
(3)良好的例子——外部分区表的创建及数据导入
hive> create external table if not exists partition_external_parquet like partition_parquet; OK Time taken: 0.106 seconds
hive> show create table partition_external_parquet2; OK CREATE EXTERNAL TABLE `partition_external_parquet2`( `member_id` string, `name` string, `add_item` string) PARTITIONED BY ( `stat_date` string, `province` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'field.delim'='\t', 'serialization.format'='\t') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_external_parquet2' TBLPROPERTIES ( 'transient_lastDdlTime'='1529753081') Time taken: 0.057 seconds, Fetched: 20 row(s)
以静态全静态分区的形式导入数据
hive> alter table partition_external_parquet2 add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'; OK Time taken: 0.078 seconds
hive> select * from partition_external_parquet2; OK 1 liujiannan NULL 20110527 liaoning 2 wangchaoqun NULL 20110527 liaoning 3 xuhongxing NULL 20110527 liaoning 4 zhudaoyong NULL 20110527 liaoning 5 zhouchengyu NULL 20110527 liaoning Time taken: 0.133 seconds, Fetched: 5 row(s)参考:https://blog.csdn.net/a2011480169/article/details/51991421