大数据： hive 时间维度表初始化记录步骤

时间维度表是数据仓库的重要统计项：很多统计都是基于时间。

下面是我的一个时间维度表：

网上很多用mysql或者是oracle的存储过程初始化数据的，下面我用Hive Sql初始化，记录一下。

1，首先设置2个变量初始开始日期和初始结束日期：

0: jdbc:hive2://node1.ansunangel.com:2181,nod> set hivevar:start_day=2020-07-01;
No rows affected (0.004 seconds)
0: jdbc:hive2://node1.ansunangel.com:2181,nod> set hivevar:end_day=2020-08-01;
No rows affected (0.004 seconds)

2，通过hive的datediff函数算出2个日期的间隔天数31天。

select datediff("${end_day}", "${start_day}");
INFO  : Compiling command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313): select datediff("2020-08-01", "2020-07-01")
INFO  : Executing command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313): select datediff("2020-08-01", "2020-07-01")
INFO  : Completed executing command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313); Time taken: 0.004 seconds
INFO  : OK
+------+
| _c0  |
+------+
| 31   |
+------+

3，然后通过repeat函数，初始31个字符o

0: jdbc:hive2://node1.ansunangel.com:2181,nod> select repeat('o',31);
INFO  : Compiling command(queryId=hive_20200805171047_82678305-d51c-4f98-99b1-cf3aa77c2e13): select repeat('o',31)
=hive_20200805171047_82678305-d51c-4f98-99b1-cf3aa77c2e13); Time taken: 0.004 seconds
INFO  : OK
+----------------------------------+
|               _c0                |
+----------------------------------+
| ooooooooooooooooooooooooooooooo  |
+----------------------------------+

4，在通过split函数讲上面的输出得到一个大小为31的空数组。

0: jdbc:hive2://node1.ansunangel.com:2181,nod> select split('ooooooooooooooooooooooooooooooo','o');
INFO  : Completed executing command(queryId=hive_20200805171255_a2c9911d-4d75-4df8-b385-9716400f58ce); Time taken: 0.004 seconds
INFO  : OK
+----------------------------------------------------+
|                        _c0                         |
+----------------------------------------------------+
| ["","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""] |

5，通过posexplode函数讲大小的为31的数组行转列变成31个空行

0: jdbc:hive2://node1.ansunangel.com:2181,nod> select posexplode(split("ooooooooooooooooooooooooooooooo", "o"));
INFO  : Completed executing command(queryId=hive_20200805171616_fa236d28-eda1-4dcb-b975-a4044d21a56b); Time taken: 0.004 seconds
INFO  : OK
+------+------+
| pos  | val  |
+------+------+
| 0    |      |
| 1    |      |
| 2    |      |
| 3    |      |
| 4    |      |
| 5    |      |
| 6    |      |
| 7    |      |
| 8    |      |
| 9    |      |
| 10   |      |
| 11   |      |
| 12   |      |
| 13   |      |
| 14   |      |
| 15   |      |
| 16   |      |
| 17   |      |
| 18   |      |
| 19   |      |
| 20   |      |
| 21   |      |
| 22   |      |
| 23   |      |
| 24   |      |
| 25   |      |
| 26   |      |
| 27   |      |
| 28   |      |
| 29   |      |
| 30   |      |
| 31   |      |
+------+------+
32 rows selected (0.109 seconds)

完整的HQL:

set hivevar:start_day=2020-07-01;
set hivevar:end_day=2020-08-01;
with dates as (
  select date_add("${start_day}", a.pos) as d
  from (select posexplode(split(repeat("o", datediff("${end_day}", "${start_day}")), "o"))) a
)
insert into dwd_dim_date
select
  d as d
  , year(d) as year
  , month(d) as month
  , day(d) as day
  , quarter(d) as quarter
  , '' 
  , date_format(d, 'u') as daynumber_of_week
  , concat(year(d),month(d))
 
from dates
order by year,month,day
;

最后的时间维度表

大数据： hive 时间维度表初始化 记录步骤

猜你喜欢

大数据： hive 时间维度表初始化记录步骤