版权声明:本文为博主原创文章,转载请说明出处 https://blog.csdn.net/u010002184/article/details/89605368
hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');
原数据文件(已经不存在了,是从原路径移动到了新路径下):
建表语句:
CREATE TABLE `sales_info`(
`sku_id` string COMMENT '商品id',
`sku_name` string COMMENT '商品名称',
`category_id3` string COMMENT '三级分类id',
`price` double COMMENT '销售价格',
`sales_count` bigint COMMENT '销售数量'
)
COMMENT '商品销售信息表'
PARTITIONED BY(
`dt` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
NULL DEFINED AS ''
STORED AS TEXTFILE
LOCATION
'hdfs://ns1/abc/sales_info'
数据内容:
[abc]$ cat sales_info.txt
12377,华为Mate10,31,999,20
45677,华为Mate30,31,2999,30
[abc]$
在hdfs新建文件夹(hello),把本地文件put到hdfs目的路径中:
hive> dfs -mkdir hdfs://ns1/abc/sales_info/hello;
hive> dfs -put sales_info.txt hdfs://ns1/abc/sales_info/hello;
hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
Found 1 items
-rw-r--r-- 3 a a 61 2019-04-27 17:34
导入数据(新建表后,之前导入过一次,这是第二次导入)、查询结果(有2条数据,是最新的,之前是5条数据):
hive> load data inpath 'hdfs://ns1/abc/sales_info/hello/sales_info.txt' overwrite into table sales_info partition(dt = '2019-04-26');
Loading data to table gdm.sales_info partition (dt=2019-04-26)
Moved: 'hdfs://ns1/abc/sales_info/dt=2019-04-26/sales_info.txt' to trash at: hdfs://ns1/abc/.Trash/Current
Partition gdm.sales_info{dt=2019-04-26} stats: [numFiles=1, numRows=0, totalSize=61, rawDataSize=0]
OK
Time taken: 0.43 seconds
hive> select * from sales_info;
OK
sku_id sku_name category_id3 price sales_count dt
12377 华为Mate10 31 999.0 20 2019-04-26
45677 华为Mate30 31 2999.0 30 2019-04-26
Time taken: 0.049 seconds, Fetched: 2 row(s)
再查看原数据文件(已经不存在了,是从原路径移动到了新路径下):
hive> dfs -ls hdfs://ns1/user/mart_tha/gdm.db/sales_info/hello;
hive>
end