2016年08月28日 14:07:25 小小程序员1986 阅读数:1405
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/jethai/article/details/52345352
#!/bin/bash die () { echo >&2 "$@" echo "usage:" echo " $0 check|split table_name [split_size]" exit 1 } [[ "$#" -lt 2 ]] && die "at least 2 arguments required, $# provided" COMMAND=$1 TABLE=$2 SIZE="${3:-1073741824}" split() { region_key=`python /home/hduser/hbase/hbase-scan.py -t hbase:meta -f "RowFilter (=, 'substring:$1')"` echo "split '$region_key'" | hbase shell } if [ "$COMMAND" != "check" ] ; then for region in `hadoop fs -ls /hbase/data/default/$TABLE | awk {'print $8'}` do [[ ${region##*/} =~ ^\. ]] && continue [[ `hadoop fs -du -s $region | awk {'print $1'}` -gt $SIZE ]] && split ${region##*/} done # check after split sleep 60 fi for region in `hadoop fs -ls /hbase/data/default/$TABLE | awk {'print $8'}` do [[ ${region##*/} =~ ^\. ]] && continue [[ `hadoop fs -du -s $region | awk {'print $1'}` -gt $SIZE ]] && echo "${region##*/} (`hadoop fs -du -s -h $region | awk {'print $1 $2'}`) is a huge region" || echo "${region##*/} (`hadoop fs -du -s -h $region | awk {'print $1 $2'}`) is a small region" done
hbase-scan.py
-
import subprocess
-
import datetime
-
import argparse
-
import csv
-
import gzip
-
import happybase
-
import logging
-
def connect_to_hbase():
-
return happybase.Connection('itr-hbasetest01')
-
def main():
-
logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s: %(message)s',level=logging.INFO)
-
argp = argparse.ArgumentParser(description='EventLog Reader')
-
argp.add_argument('-t','--table', dest='table', default='eventlog')
-
argp.add_argument('-p','--prefix', dest='prefix')
-
argp.add_argument('-f','--filter', dest='filter')
-
argp.add_argument('-l','--limit', dest='limit', default=10)
-
args = argp.parse_args()
-
hbase_conn = connect_to_hbase()
-
table = hbase_conn.table(args.table)
-
logging.info("scan start")
-
scanner = table.scan(row_prefix=args.prefix, batch_size=1000, limit=int(args.limit), filter=args.filter)
-
logging.info("scan done")
-
i = 0
-
for key, data in scanner:
-
logging.info(key)
-
print key
-
i+=1
-
logging.info('%s rows read in total', i)
-
if __name__ == '__main__':
-
main()
本文出自 “点滴积累” 博客,请务必保留此出处http://tianxingzhe.blog.51cto.com/3390077/1717714