hive的执行流程:
mysql最大默认连接处为100
hive案例一: 比较Python和R语言哪个更适合spark Hadoop深度学习?
- 开启集群:
- 切换到datas目录下,将数据传输到该目录下 查看是否传输成功
- 切换到hive的目录下:
- 新创建一个窗口 并切换到datas目录下,编写sql语句
- 在该文件中输入相关sql代码:
- 回到第一个窗口进行运行
- 查看运行结果 r语言数量:
- 查看运行结果 Python语言数量:
由于上述结果不能同时出现在一起 为了将两种结果对比 显示出来 需在上述代码后加上:
进行查询结果:
案例二:统计住房情况 研究房子大小与购买情况的分析:
编写代码 输入如下:
进行查看结果:
拓展需求:哪个时间段的楼龄最受欢迎:
代码如下:
set hive.exec.mode.local.auto=true;
drop table if exists db_lianjia.tb_lj;
drop table if exists db_lianjia.tb_info;
drop database if exists db_lianjia;
create database db_lianjia;
use db_lianjia;
create table db_lianjia.tb_lj(
village_name string,
house_type string,
house_area int,
region string,
floor_str string,
direction string,
total_price string,
square_price string,
build_date string
)
row format delimited fields terminated by ','
lines terminated by '\n'
stored as textfile;
load data local inpath '/opt/datas/2nd_house_price.csv' into table db_lianjia.tb_lj;
select area_group,count(*) as total
from (
select case
when 0<house_area and house_area<=50 then '50平米以下'
when 50<house_area and house_area<=70 then '50-70平米'
when 70<house_area and house_area<=90 then '70-90平米'
when 90<house_area and house_area<=110 then '90-110平米'
when 110<house_area and house_area<=130 then '110-130平米'
when 130<house_area and house_area<=150 then '130-150平米'
else '150平米以上'
end as area_group
from db_lianjia.tb_lj
) as t
group by t.area_group order by total desc;
select t.year_group , count(*) total
from (
select
case when (2019-substring(build_date,0,4)) between 0 and 5 then '5年内'
when (2019-substring(build_date,0,4)) between 6 and 10 then '10年内'
when (2019-substring(build_date,0,4)) between 11 and 15 then '15年内'
when (2019-substring(build_date,0,4)) between 16 and 20 then '20年内'
else '20年以上'
end as year_group
from db_lianjia.tb_lj
) as t
group by t.year_group order by total desc;
--保存结果表以提供后面的业务分析使用(中间结果表)
create table db_lianjia.tb_info as
select region,direction,
case
when 0<house_area and house_area<=50 then '50平米以下'
when 50<house_area and house_area<=70 then '50-70平米'
when 70<house_area and house_area<=90 then '70-90平米'
when 90<house_area and house_area<=110 then '90-110平米'
when 110<house_area and house_area<=130 then '110-130平米'
when 130<house_area and house_area<=150 then '130-150平米'
else '150平米以上'
end as area_group,
case when (2019-substring(build_date,0,4)) between 0 and 5 then '5年内'
when (2019-substring(build_date,0,4)) between 6 and 10 then '10年内'
when (2019-substring(build_date,0,4)) between 11 and 15 then '15年内'
when (2019-substring(build_date,0,4)) between 16 and 20 then '20年内'
else '20年以上'
end as year_group
from db_lianjia.tb_lj;
自定义功能:
修改pom.xml之后 创建java文件:
将jar包传输进来:
开集群 并查看:
切换到hive:
注册jar包:
注册function: