上一节我们在CentOS7中安装了Hive,本章将演示如何在Hive当中完成词频统计。
1 系统、软件以及前提约束
- 在CentOS7中安装Hive并启动
https://www.jianshu.com/p/755944f01fab - 所有操作都以root用户进行
2 操作
- 1 在/root下创建一个email文件,内容如下
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
el.callaha@Apperiohu
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
- 2 上传这个文件到HDFS
cd /root/hadoop-2.5.2
./hdfs dfs -put /root/email /email
- 3 进入hive命令行
cd /root/apache-hive-0.14.0-bin/bin
./hive
- 4 在hive命令行中创建t_email表
# 在hive中创建一张表t_email
create table if not exists t_email(email string comment 'user email') comment 'user email' row format delimited fields terminated by ' ' lines terminated by '\n' stored as textfile;
# 将HDFS数据导入Hive
load data inpath '/email' into table t_email
# 或者我们也可以将本地的数据导入Hive
load data local inpath '/root/email' into table t_email
# 总共有多少表
show tables;
# 查看表的详细信息
desc t_email;
- 5 在hive命令行中统计
# 查看数据
select * from t_email;
# 统计共有多少行
select count(1) from t_email;
# 统计每个邮箱出现的次数
select email,count(1) from t_email group by email;
以上就是在Hive中进行数据基本操作的过程。
转载于:https://www.jianshu.com/p/ad32cfbdb7a4