part I:split policy
if u have used some index tools like lucene,there are some factors to control how many docs to merge some segs to a large one,and whether to freeze some large enough seg files to a fix size ...yes ,thess cases are all similar to hbase's merge regions capacities(called online merge?) like below described.
in opposite,hbase's has a 'region split policy' but lucene.that is if some regions are too large so decrease perf ,or some other cases like remerging a big region consume a lot time.so it is sensible!
in 0.94.2,there are two policies for split,as desciribed below:
policy | trigger | split point | feature/use case |
ConstantSizeRegionSplitPolicy |
-all stores belong this region are splittable; -one of the store file size is bigger than max.hstorefile.size |
use the largetst store's split point | -a constant size to check threshold -suitable for predictable data increasement with pre-split |
IncreasingToUpperBoundRegionSplitPolicy (default) |
-all stores belong this region are splittable(there is a bug in this verion,[1]) -one store file size is bigger than A, A=min(max.stofile.size,C^2 * flush.size), C=number of regions with same table on this rs .so on this rs,all the regions share the same value A when computing split-size-to-check |
same as above | -class inherit above policy,but use silent plicy -a dynamic handle case,fit for unpreditable data size at first period. but from the trigger on left side,we know that if regions count grow to 9 then this policy will BACK to the above policy! |
split point comutation
-exclude the meta table(i have blogged in previous topics)
-retrive the largest store file
-get the middle block of the file
-create a 'rowkey'' with the middle key (this is the target ) TODO
part II:split principle
see [2] or look into HBaseAdmin#split(),below is a bird view:
part III: merge regions
as of this version,there is only a offline merge capacity,that is,util.Merge.if u want to use online merge,see 'online merge' which will be fixed in 0.95 or 0.98.
part IV:conclusions
in general,i prefer to use constant-size policy,a simple,controllable solution ,if u preslit the table while creating.
but u must specify the property with value 'ConstantSizeRegionSplitPolicy' :
hbase.regionserver.region.split.policy
ref:
[1] IncreasingToUpperBoundRegionSplitPolicy.shouldSplit() should check all the stores before returning.
hbase -how many regions are fit for a table when prespiting or keeping running
[2] Apache HBase Region Splitting and Merging (detailed split principle)