1. Load Data into Phoenix
- Using our map-reduce based CSV loader for bigger data sets http://phoenix.apache.org/bulk_dataload.html
hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /data/example.csv
- Using psql.py to load .csv file http://phoenix.apache.org/bulk_dataload.html
bin/psql.py -t EXAMPLE localhost data.csv
- Mapping an existing HBase table to a Phoenix table and using the UPSERT SELECT command to populate a new table.
create 't1', {NAME => 'cf1', VERSIONS => 5} --define table and column family 'cf1' in hbase shell CREATE VIEW t1 ( pk VARCHAR PRIMARY KEY, cf1.column1 VARCHAR, cf1.column2 INTEGER); --create view in phoenix sqlline.py for t1 --Instead, you can create the table in phoenix directly CREATE TABLE t1 ( pk VARCHAR PRIMARY KEY, column1 VARCHAR, column2 INTEGER);
- Populating the table through our UPSERT VALUES command.
upsert into test_table values (2,'World!');
2. Using client tool SQuirrel
- Remove prior phoenix-[oldversion]-client.jar from the lib directory of SQuirrel, copy phoenix-[newversion]-client.jar to the lib directory (newversion should be compatible with the version of the phoenix server jar used with your HBase installation)
- Start SQuirrel and add new driver to SQuirrel (Drivers -> New Driver)
- In Add Driver dialog box, set Name to Phoenix, and set the Example URL to jdbc:phoenix:localhost.
- Type “org.apache.phoenix.jdbc.PhoenixDriver” into the Class Name textbox and click OK to close this dialog.
- Switch to Alias tab and create the new Alias (Aliases -> New Aliases)
- In the dialog box, Name: any name, Driver: Phoenix, User Name: anything, Password: anything
- Construct URL as follows: jdbc:phoenix: zookeeper quorum server. For example, to connect to a local HBase use: jdbc:phoenix:localhost
- Press Test (which should succeed if everything is setup correctly) and press OK to close.
- Now double click on your newly created Phoenix alias and click Connect. Now you are ready to run SQL queries against Phoenix.
3. Performance optimizing
pre-splitting the data into multiple regions
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=16
Per-split table by row key
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')
Use multiple column families
CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)
Use compression On disk compression improves performance on large tables
CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ'
Others:
-
Create indexes See faq.html#/How_do_I_create_Secondary_Index_on_a_table
-
Optimize cluster parameters See http://hbase.apache.org/book/performance.html
-
Optimize Phoenix parameters See tuning.html
4. Should I pool Phoenix JDBC Connections?
No, it is not necessary to pool Phoenix JDBC Connections.
Phoenix’s Connection objects are different from most other JDBC Connections due to the underlying HBase connection. The Phoenix Connection object is designed to be a thin object that is inexpensive to create. If Phoenix Connections are reused, it is possible that the underlying HBase connection is not always left in a healthy state by the previous user. It is better to create new Phoenix Connections to ensure that you avoid any potential issues.
参考:
http://phoenix.apache.org/installation.html
http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html
http://phoenix.apache.org/language/index.html
http://phoenix.apache.org/faq.html