study and summarie below
art 1:Table attributes
attr | default | usage/principle | use case | note |
Bloom filter | disable | cost some mem to impove lookup time TBD | do huge range scan table | this attr contains 'row','row-col',or none |
Column families | a printable string since this will be used as the dir name under region-name | |||
Maximum file size | 10G in 94.2 | maxStoreSize in fact;i.e. property "hbase.hregion.max.filesize" set in hbase-site.xml | ||
Read-only | false | like a firmware to keep safe .i.e. a 'dead' table that never changed | ||
Memstore flush size | 128m in 94.2 | same effect with property in xml 'hbase.hregion.memstore.flush.size' | 1.this value determine the frequency of generating store file 2.as 1,this effects the replay time of hlog when a rs down. |
|
Deferred log flush | false | if false,use 'hbase.regionserver.optionallogflushinterval' to check period to sumit edits | if true may cause data loss as these cached data are in memory before sync to fs |
|
Part 2:Column Family attributes
attr | default | usage/principle | use case | note |
In-memory | false | cache some blocks of a small family in mem to speed up query | analogous to secondarny index table ,for small table | not guanrantee to when or how much blocks being cached |
Bloom filter | see Part 1 | |||
Replication scope | 0(disable) | sync local cluster data with remote ones TBD | for load balance by distribute req to clusters? | |
Maximum versions | 3 | control that how many versions(changes)are kept in storage | use 1 in general.if u want to check last verion only,given '2' is a good idea. this will interact with 'Time-to-live' |
|
Compression | none | compress this family if specified SNAPPY,LZO,GZ.. | u must be clear completely what your requirements are then use corresponding one | |
Block size | 64k | a store file is splited into certain blocks,so smaller block cause faster reading randomly;else use bigger if for sequential readings TBD | ||
Block cache | true | when read some rows from hbase,this dertermine whehter to write back to cache to speed up last access | use 'true' if clients used access to the much duplicted rows ;'false' if do a whole table scan or less readings than writes system | |
Time-to-live | max.int(sec in unit) | how along a cell value will be kept in storage | if this is a 'recycled' system(ie. rolling),use a appropriate value to keep data size |
this will interact with 'Maximum versions',that is both attributes contorl the data verions overlying by this |
Ref:
hbase definitive book