使用hbase来解决上亿条数据的准实时响应

使用hbase来解决亿级数据的准实时响应

项目中的app行为日志，用户授权收集的通讯录、通话记录、短信和联系人信息，随着时间的推进，数据量进入亿数据级，千万级的创建索引，来加快查询速度的优化方式，此时可能已经不起作用了。为解决信审阶段实时的查询请求，引入hbase来解决响应慢的问题。

When Should I Use HBase?
HBase isn’t suitable for every problem.

First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example. Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.

Third, make sure you have enough hardware. Even HDFS doesn’t do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

hbase并不适合解决所有的问题。首先要有足够多的数据；其次，没有关系型数据库的特性（列类型，二级索引，事务，强大的查询语言等）业务可以正常进行；另外，确定有足够的硬件，特别是HDFS没有5台DataNode和一个NameNode节点不会工作的很好。

项目通过新增一个大数据平台来处理大流量，高并发，低延时的请求，数据一方面与hbase交互，另一方面进入数据处理总线kafka，与数据中心打通数据流。

使用hbase来解决上亿条数据的准实时响应

猜你喜欢