现象
在Trafodion数据库中做批量删除时,执行时长超过2小时后报错如下,
>>delete from test_delete where a>100;
*** WARNING[6008] Statistics for column (A) from table TRAFODION.SEABASE.TEST_DELETE were not available. As a result, the access path chosen might not be the best possible.
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::deleteRow returned error HBASE_ACCESS_ERROR(-706). Cause: java.io.IOException: java.io.IOException: delete late checkin for transaction 72339069014678765 in region TRAF_RSRVD_3:TRAFODION.SEABASE.TEST_DELETE,,1581140570266.daff73ed901f4045f4ae996db16dd5ef.,skey=null,ekey=null
org.apache.hadoop.hbase.client.transactional.TransactionalTable.delete(TransactionalTable.java:757)
org.apache.hadoop.hbase.client.transactional.RMInterface.delete(RMInterface.java:937)
org.trafodion.sql.HTableClient.deleteRow(HTableClient.java:1604)
org.trafodion.sql.HBaseClient.deleteRow(HBaseClient.java:2392).
--- 0 row(s) deleted.
分析
报错后,通过get statistics查看到语句执行的运行时统计信息如下,
>>get statistics for qid current default;
Qid MXID11000016667212447900982995646000000000206U3333302T000_335___SQLCI_DML_LAST__
Compile Start Time 2020/02/08 14:11:01.268632
Compile End Time 2020/02/08 14:11:01.279867
Compile Elapsed Time 0:00:00.011235
Execute Start Time 2020/02/08 14:11:01.280240
Execute End Time 2020/02/08 16:11:01.894070
Execute Elapsed Time 2:00:00.613830
State DEALLOCATED
Rows Affected 184,938
SQL Error Code -8448
Stats Error Code 0
Query Type SQL_DELETE_NON_UNIQUE
Sub Query Type SQL_STMT_NA
Estimated Accessed Rows 0
Estimated Used Rows 0
Parent Qid NONE
Parent Query System NONE
Child Qid NONE
Number of SQL Processes 1
Number of Cpus 1
Transaction Id 72339069014678765
Source String delete from test_delete where a>100;
SQL Source Length 36
Rows Returned 0
First Row Returned Time -1
Last Error before AQR 0
Number of AQR retries 0
Delay before AQR 0
No. of times reclaimed 0
Cancel Time -1
Last Suspend Time -1
Query hash 0
SLA Name defaultSLA
Profile Name defaultProfile
No. of times executed 1
Min. Execute Time 7,200.613830 secs
Max. Execute Time 7,200.613830 secs
Avg. Execute Time 7,200.613888 secs
Stats Collection Type OPERATOR_STATS
LC RC Id PaId ExId Frag TDBName DOP Dispatches OperCPUTime EstRowsUsed ActRowsUsed ActDataUsed Details
5 . 6 . 0 0 EX_ROOT 1 2 8 0 0 0 72297143|0|0|11333|
3 4 5 6 0 0 EX_ONLJ 1 366 22,537 1.19382e+06 0 0
. . 4 5 0 0 EX_TRAF_VSBB_DELETE 1 181 166,885 1 183,118 0 TRAF_RSRVD_3:TRAFODION.SEABASE.IDX_TEST_DELETE|10496397|7195172250|0|
1 2 3 5 0 0 EX_ONLJ 1 368 49,313 1.19382e+06 184,938 10,726,404
. . 2 3 0 0 EX_TRAF_DELETE 1 182 71,645,451 1 184,938 10,726,404 TRAF_RSRVD_3:TRAFODION.SEABASE.TEST_DELETE|20713164|7166976921|0|
. . 1 3 0 0 EX_TRAF_SELECT 1 182 412,949 1.19382e+06 186,188 10,798,904 TRAF_RSRVD_3:TRAFODION.SEABASE.TEST_DELETE|20108412|112207|186368|
--- SQL operation complete.
通过以上结果中“Execute Elapsed Time 2:00:00.613830”,证明语句在执行到正好2小时报错。
解决
Trafodion数据库中,事务有一个默认的超时机制,超时时间为2小时。即如果大事务执行时间超过2小时未结束,则事务将自动退出并回滚。
可以通过修改配置来增加事务的超时时间,修改方法为在HBase高级配置选项中添加hbase.transaction.lease.timeout,值可以根据情况设置相应的时间。