概述
Hive的索引功能是在0.7版本引入的,从3.0开始该特性被移除(参考HIVE-18448.)。不过3.0引入了物化视图这一类似索引的技术。Hive使用索引是为了提高查询表中某些列的速度。如果没有索引,使用诸如’WHERE tab1.col1 = 10’这样的查询将会加载并处理整个表或分区中的记录。此时如果 col1 存在索引,就只需要加载和处理文件的一部分。这和使用列式存储格式(Parquet, ORC)有着相同的逻辑。
创建索引
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]
示例
> CREATE INDEX test_index ON TABLE test_hive (name) AS 'COMPACT' WITH DEFERRED REBUILD;
--创建索引使用RCFile文件格式
> CREATE INDEX test_index2 ON TABLE test_hive (name) AS 'COMPACT' WITH DEFERRED REBUILD STORED AS RCFILE;
查看索引
SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];
示例
> SHOW INDEX ON test_hive;
+-----------------------+-----------------------+-----------------------+-----------------------------------+-----------------------+----------+--+
| idx_name | tab_name | col_names | idx_tab_name | idx_type | comment |
+-----------------------+-----------------------+-----------------------+-----------------------------------+-----------------------+----------+--+
| test_index | test_hive | name | default__test_hive_test_index__ | compact | |
| test_index2 | test_hive | name | default__test_hive_test_index2__ | compact | |
+-----------------------+-----------------------+-----------------------+-----------------------------------+-----------------------+----------+--+
修改索引
ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;
ALTER INDEX … REBUILD会生成使用WITH DEFERRED REBUILD子句创建的索引,或者重建以前创建的索引。如果指定了分区,则仅重建该分区。
示例
> ALTER INDEX test_index ON test_hive REBUILD;
删除索引
DROP INDEX [IF EXISTS] index_name ON table_name;
示例
> DROP INDEX test_index2 ON test_hive;
> SHOW INDEX ON test_index;
+-----------------------+-----------------------+-----------------------+----------------------------------+-----------------------+----------+--+
| idx_name | tab_name | col_names | idx_tab_name | idx_type | comment |
+-----------------------+-----------------------+-----------------------+----------------------------------+-----------------------+----------+--+
| test_index | test_hive | name | default__test_hive_test_index__ | compact | |
+-----------------------+-----------------------+-----------------------+----------------------------------+-----------------------+----------+--+
参考
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterIndex
https://cwiki.apache.org/confluence/display/Hive/IndexDev
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing