sqoop综合案例

1.检查Hadoop进程是否已经启动

cd /apps/hadoop/sbin  
./start-all.sh  
jps

2.首先在Linux本地，新建data/case4目录，用于存放所需文件。

mkdir -p /data/case4

在Linux中切换到data/case4目录下，下载文本文件order_items。
首先在HDFS上新建/mycase4/目录，然后将Linux本地/data/case4目录下的order_items文件导入到HDFS的/mycase4/目录中。

hadoop fs -mkdir -p /mycase4/  
hadoop fs -put /data/case4/order_items /mycase4/

3.使用HDFS命令，查看HDFS目录结果。

hadoop fs -ls -R /mycase4/

4.启动Hive，首先查看下Mysql数据库是否开启，若没有开启，应先启动Mysql，然后启动hive。

sudo service mysql status  
sudo service mysql start 
hive

创建myhive 数据库

create database myhive;

查看hive中数据库

show databases;

使用myhive数据库

use myhive;

在myhive库中创建订单明细表（order_items），用于存储hdfs上/mycase4/order文件中的数据。（写下创建表语句）

create table order_items  
(item_id string,  
order_id string,  
goods_id string,  
goods_number string,  
shop_price string,  
goods_price string,  
goods_amount string)  
row format delimited  
fields terminated by '\t'  
stored as textfile;

将hdfs上/mycase4目录下的order_items文件加载到hive的order_items表中。

load data inpath '/mycase4/' overwrite into table order_items;

5.统计order_items有多少行记录。（写出统计语句及结果）

select count(1) from order_items;

6.在Hive中创建mytb表，表字段为(item_id ,order_id)，字符类型为string，以 "\t"为分割符。

create table mytb (goods_id string, goods_number string) row format delimited fields terminated by '\t';

7.使用hive，统计每个商品ID(goods_id字段)数量；并将统计结果，插入到mytb表中。

insert into table mytb select goods_id, count(1) as num from order_items group by goods_id;

8.另开启一个窗口，使用mysql -u root -p连接mysql数据库

mysql -u root -p

在mysql中创建DD数据库。

 create database DD;

使用DD数据库，并在DD数据库中创建表，名为items。

use DD;  
create table items(goods_id varchar(100),goods_number varchar(100));

9.另开启一个窗口，使用sqoop，将Hive中表mytb中的数据，导出到mysql中数据库DD下的items中。

sqoop export --connect jdbc:mysql://localhost:3306/DD --username root --password strongs \  
--table items --export-dir /user/hive/warehouse/myhive.db/mytb --input-fields-terminated-by '\t'

在mysql数据库中用select语句查看items表是否导入数据，由于数据很多我们只查看前十条记录。

select * from items limit 10;

球球的学习笔记

发布了32 篇原创文章 · 获赞 28 · 访问量 1390

私信关注

sqoop综合案例

sqoop综合案例

猜你喜欢