版权声明:本文为博主原创文章,出处为 http://blog.csdn.net/silentwolfyh https://blog.csdn.net/silentwolfyh/article/details/81224151
目录
1、需求
2、问题
3、过程
————————————————————————————-
1、需求
Hive表有几个T数据包含了20万个Partition,需要将hive表删除
2、问题
drop table if exists table_name;
出现的错误信息如下:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
3、过程
3.1、有人说是hive表锁了,锁的信息如下:(没成功)
http://www.ericlin.me/2015/05/how-table-locking-works-in-hive/
3.2、有人说配置问题,配置信息如下:(没成功)
3.3、删除单个分区命令:
delete from table_name where dt=’2018-07-23’;(报错没成功)
3.4、删除单个分区命令:
ALTER TABLE table_name DROP PARTITION(dt=’2018-07-23’)(成功)
https://stackoverflow.com/questions/46307667/how-do-i-drop-all-partitions-at-once-in-hive
3.5、我通过Python按照天进行删除,代码如下:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os
import time
import logging
'''
Hive几百万个Partition或者几个T的数据删除过程。
1、获取所有的partion,再一个一个删除
2、最后drop table table_name;
我的Partition格式
dt=2015-02-23/pkey=20160430
dt=2015-02-23/pkey=47121231E
dt=2015-02-24/pkey=20150620
dt=2015-02-24/pkey=20160430
dt=2015-02-24/pkey=47121231E
dt=2015-02-25/pkey=20150620
dt=2015-02-25/pkey=20151231
dt=2015-02-25/pkey=20160430
'''
if __name__ == '__main__':
logging.basicConfig(filename='dropHiveTable.log', filemode="w", level=logging.DEBUG)
start = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
#partition日期的集合
dateSet = set()
lines = os.popen('hive -e " show partitions database.table_name" ')
for partiton in lines:
dateSet.add(partiton.split("=")[1].split("/")[0])
# partition日期的集合的排序
dateList = list(dateSet)
dateList.sort()
logging.info("所有的日期如下:")
logging.info(dateList)
# 请求hive中每个Partition的数据
for hiveDate in dateList:
logStart = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
os.popen('hive -e " ALTER TABLE database.table_name DROP PARTITION(dt=\'%s\') ;"' % (hiveDate))
logEnd = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
logging.info("Partition [" + hiveDate + "] 删除完毕" + "开始时间【" + logStart + "】,结束时间【" + logEnd + "】")
os.popen('hive -e " drop table if exists database.table_name;"')
# 结束且打印时间
end = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
logging.info("程序开始时间【"+start+"】,结束时间【"+end+"】")