故事从一个客户正在将数据库平台从Oracle迁移到OceanBase的过程中说起。在并轨生产运行过程中,应用报告了ORA-54错误号,这让Oracle的数据库管理员感到震惊,并花费了大量时间来进行分析。然而,他们最终发现这个错误是由OceanBase的驱动程序抛出。在尤其有些客户一点儿问题全中心排查躁动,这类报错让Oracle DBA分析oracle数据库,确实会让人要想爆粗,浪费了甲方的付费的乙方资源,同时还不利于故障定位。下面对比一下oracle和oceanbase的resource busy报错。
oracle
-- session 1 sid=1295

SQL> create table test2(id int primary key, name varchar2(10));

Table created.


SQL> insert into test2 values(1,'anbob');

1 row created.


SQL> insert into test2 values(2,'anbob.com');

1 row created.


SQL> select * from test2;

ID NAME

---------- ----------

1 anbob

2 anbob.com


SQL> commit;

Commit complete.


SQL> update test2 set name='www.anbob' where id=1;

1 row updated.

-
----- no commit;


-- session 2 sid=1040

SQL> alter system set ddl_lock_timeout=0;

System altered.


SQL>
alter table
test2 modify name varchar2(100);

alter table test2 modify name varchar2(100)

*

ERROR at line 1:

ORA-00054:
resource busy and acquire with NOWAIT specified or timeout expired


SQL> select * from test2
for update nowait;
select * from test2 for update nowait

*

ERROR at line 1:

ORA-00054:
resource busy and acquire with NOWAIT specified or timeout expired



SQL> select * from test2 for
update wait 5;
select * from test2 for update wait 5

-- wait 5 sec
*

ERROR at line 1:

ORA-30006:
resource busy; acquire with WAIT timeout expired


SQL> select * from test2
for update;
select * from test2 for update

-- waiting until ctrl+c

*

ERROR at line 1:

ORA-01013: user requested cancel of current operation


SQL> @lock 'sid in(1295,1040)'


SID TY LMODE MODE_HELD REQUEST MODE_REQUESTED LOCK_ID1 LOCK_ID2 CTIME BLOCK

------- -- ---------- --------------- ---------- --------------- ---------- ---------- ---------- ----------

1040 TX 0 None
6 Exclusive
13369353 3874 33 0

1040 TM 3 Row-X (SX) 0 None 181384 0 33 2

1040 AE 4 Share 0 None 134 1 811 2

1295 TX 6 Exclusive 0 None 13369353 3874 817 1

1295 AE 4 Share 0 None 134 1 878 2

1295 TM 3 Row-X (SX) 0 None 181384 0 817 2

6 rows selected.


SQL> select * from dba_waiters;


WAITING_SESSION WAITING_CON_ID HOLDING_SESSION HOLDING_CON_ID LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2

--------------- -------------- --------------- -------------- -------------------------- ---------------------------------------- ---------------------------------------- ---------- ----------

1040 1 1295 1 Transaction Exclusive Exclusive 13369353 3874

Note:

DDL受ddl_lock_timeout参数影响单位s,超时报ora-54。但是select for update不带wait限制, 可能是for ever吧. 如果for update nowait是报ora-54, 如果是for update wait N seconds超时后是报ORA-30006, 解决可以从后台找到block session 事务没有提交的会话kill .
Oceanbase for oracle
[admin@OceanBase1 ~]$ obclient -h172.xxxxxx -P2883 -uanbob@orauser#obdemo:10001 -p

Enter password:

Welcome to the OceanBase. Commands end with ; or \g.

Your OceanBase connection id is 25041

Server version: OceanBase 3.2.4.1 (Built Jan 8 2023 22:52:43)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


obclient [ANBOB]> create table test2(id int primary key, name varchar2(10));

Query OK, 0 rows affected (0.160 sec)


obclient [ANBOB]> insert into test2 values(1,'anbob');

Query OK, 1 row affected (0.019 sec)


obclient [ANBOB]> insert into test2 values(2,'anbob.com');

Query OK, 1 row affected (0.002 sec)

obclient [ANBOB]> commit;

obclient [ANBOB]> select * from test2;

+----+-----------+

| ID | NAME |

+----+-----------+

| 1 | anbob |

| 2 | anbob.com |

+----+-----------+

2 rows in set (0.014 sec)

# session 1

obclient [ANBOB]> update test2 set name='www.anbob' where id=1;

Query OK, 1 row affected (0.005 sec)

Rows matched: 1 Changed: 1 Warnings: 0

-- no commit;
# session 2


obclient [ANBOB]> select * from test2;

+----+-----------+

| ID | NAME |

+----+-----------+

| 1 | anbob |

| 2 | anbob.com |

+----+-----------+

2 rows in set (0.002 sec)


obclient [ANBOB]>
alter table
test2 modify name varchar2(100);

Query OK, 0 rows affected (0.067 sec)


obclient [ANBOB]> desc test2;

+-------+---------------+------+-----+---------+-------+

| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |

+-------+---------------+------+-----+---------+-------+

| ID | NUMBER(38) | NO | PRI | NULL | NULL |

| NAME |
VARCHAR2(100)
| YES | NULL | NULL | NULL |

+-------+---------------+------+-----+---------+-------+

2 rows in set (0.007 sec)


# session 1

obclient [ANBOB]> desc test2;

+-------+---------------+------+-----+---------+-------+

| FIELD | TYPE | NULL | KEY | DEFAULT | EXTRA |

+-------+---------------+------+-----+---------+-------+

| ID | NUMBER(38) | NO | PRI | NULL | NULL |

| NAME |
VARCHAR2(100)
| YES | NULL | NULL | NULL |

+-------+---------------+------+-----+---------+-------+

2 rows in set (0.005 sec)


obclient [ANBOB]> select * from test2;

+----+-----------+

| ID | NAME |

+----+-----------+

| 1 | www.anbob |

| 2 | anbob.com |

+----+-----------+

2 rows in set (0.004 sec)


Note:

在Oceanbase(以下简称OB) 事务并不会堵塞DDL.这点与oracle不同。
测试OB中的select for update
# session 1

obclient [ANBOB]> SHOW VARIABLES LIKE 'ob_query_timeout';

+------------------+----------------+

| VARIABLE_NAME | VALUE |

+------------------+----------------+

| ob_query_timeout | 10000000000000 |

+------------------+----------------+

1 row in set (0.006 sec)


obclient [ANBOB]> ALTER SYSTEM SET ob_query_timeout = 72;

Query OK, 0 rows affected (0.003 sec)


obclient [ANBOB]> select * from test2 where id=1 for update nowait;

ORA-00600: internal error code, arguments: -4012, Timeout


obclient [ANBOB]> ALTER SESSION SET ob_query_timeout = 7200;

Query OK, 0 rows affected (0.002 sec)


obclient [ANBOB]> SHOW VARIABLES LIKE 'ob_query_timeout';

+------------------+-------+

| VARIABLE_NAME | VALUE |

+------------------+-------+

| ob_query_timeout | 7200 |

+------------------+-------+

1 row in set (0.005 sec)


obclient [ANBOB]> select * from test2 where id=1 for update nowait;

ORA-00600: internal error code, arguments: -6212, Statement is timeout

Note:

原来ob_query_timeout参数配置的太小,自己的SQL都没执行完,单位是微秒,(这单位是否合理?自行判断),建议OB在数据库参数上也限制一下最小值。修改此参数意为让for update尽快报错.不过上面看到报出来oracle DBA一个很有挑战的错误代码ORA-600 Interal error。不过参数还不一样4012是timeout, 6212 是statement is timeout. 不纠结,知道了单位我把配置改为10秒.
obclient [ANBOB]> alter session set ob_query_timeout =10000000;

Query OK, 0 rows affected (0.002 sec)


#session 1

obclient [ANBOB]> select * from test2 where id=1 for update ;

+----+-----------+

| ID | NAME |

+----+-----------+

| 1 | www.anbob |

+----+-----------+

1 row in set (0.002 sec)


# session 2

obclient [ANBOB]> select * from test2 where id=1 for update nowait;

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

--立即返回报错与oracle相同


obclient [ANBOB]> select * from test2 where id=1 for update ; --受ob_query_timeout 影响

-- wait 10s

ORA-30006: resource busy; acquire with WAIT timeout expired


obclient [ANBOB]> alter session set ob_query_timeout =20000000;

Query OK, 0 rows affected (0.002 sec)


obclient [ANBOB]> select * from test2 where id=1 for update ;

-- wait 20s

ORA-30006: resource busy; acquire with WAIT timeout expired


Note:

select for update与oracle的预期一样。报错的文本内容和oracle都一模一样,这让兼容性是好是坏?
ORACLE 错误编号
oracle@19c1:/home/oracle/tpt-oracle-master $ oerr ora 30006

30006, 00000, "resource busy; acquire with WAIT timeout expired"

// *Cause: The requested resource is busy.

// *Action: Retry the operation later.

oracle@19c1:/home/oracle/tpt-oracle-master $ oerr ora 54

00054, 00000, "resource busy and acquire with NOWAIT specified or timeout expired"

// *Cause: Interested resource is busy.

// *Action: Retry if necessary or increase timeout.

oracle@19c1:/home/oracle/tpt-oracle-master $

OCEANBASE ORA错误

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000218668
在OB中如何解锁?
obclient [oceanbase]> select * from gv$table where table_id=1100611139453790 \G

*************************** 1. row ***************************

tenant_id: 1001

tenant_name: orauser

table_id: 1100611139453790

table_name: TEST2

database_id: 1100611139404831

database_name: ANBOB

tablegroup_id: -1

tablegroup_name: NULL

table_type: 3

zone_list: zone1;zone2;zone3

primary_zone: zone1;zone2;zone3

collation_type: 46

locality: FULL{1}@zone1, FULL{1}@zone2, FULL{1}@zone3

schema_version: 1703594038143136

read_only: 0

comment:

index_status: 1

index_type: 0

part_level: 0

part_func_type: 0

part_func_expr:

part_num: 1

sub_part_func_type: 0

sub_part_func_expr:

sub_part_num: 1

dop: 1

auto_part: 0

auto_part_size: -1

1 row in set (0.019 sec)


obclient [oceanbase]> SELECT * FROM __all_virtual_trans_lock_stat ORDER BY ctx_create_time limit 8\G

*************************** 1. row ***************************

tenant_id: 1001

trans_id: {hash:14812608598155958701, inc:88587874, addr:"172.20.xxxx:2882", t:1703597571331695}
svr_ip: 172.20.xxx

svr_port: 2882

partition: {tid:1100611139453790, partition_id:0, part_cnt:0}

table_id: 1100611139453790
rowkey: table_id=1100611139453790 rowkey_object=[{"DECIMAL":"1"}]

session_id: 3221600981
proxy_id: "172.20.yyyyy:2883"

ctx_create_time: 2023-12-26 21:32:51.331695
expired_time: 2024-01-07 11:19:31.331902

row_lock_addr: 140012405040624

1 row in set (0.059 sec)


obclient [oceanbase]>
kill query 3221600981;
Query OK, 0 rows affected (0.001 sec)


obclient [oceanbase]> SELECT * FROM __all_virtual_trans_lock_stat ORDER BY ctx_create_time limit 8\G

*************************** 1. row ***************************

tenant_id: 1001

trans_id: {hash:14812608598155958701, inc:88587874, addr:"172.20.xxxxx:2882", t:1703597571331695}

svr_ip: 172.20.xxxx

svr_port: 2882

partition: {tid:1100611139453790, partition_id:0, part_cnt:0}

table_id: 1100611139453790

rowkey: table_id=1100611139453790 rowkey_object=[{"DECIMAL":"1"}]

session_id: 3221600981

proxy_id: "172.20.yyyy:2883"

ctx_create_time: 2023-12-26 21:32:51.331695
expired_time: 2024-01-07 11:19:31.331902

row_lock_addr: 140012405040624

1 row in set (0.062 sec)


obclient [oceanbase]>
kill connection 3221600981;
Query OK, 0 rows affected (0.002 sec)


obclient [oceanbase]> SELECT * FROM __all_virtual_trans_lock_stat ORDER BY ctx_create_time limit 8\G

Empty set (0.072 sec)




# session 1

obclient [ANBOB]> select * from test2 where id=1 for update ;

ERROR-02013: Lost connection to MySQL server during query

obclient [ANBOB]>

Note:

从__all_virtual_trans_lock_stat可以查询到lock资源(但在V3版本可能存在查询不到的现象),而且kill query也未生效,最后使用kill connection终止连接释放。
Summary:
Oceanbase存在一些错误编号与oracle相同,错误message也不尽相同,且触发机制可能存在差异, 如果在当前的生产环境中存在oracle和oceanbase两类数据库的应用,应用中的错误日志根据错误编号不容易区分报错数据源是oracle还是oceanbase. 而且oracle的每个版本相同的错误code可能存在错误message的调整,如资源限制类。恐怕OCEANBASE很难保持与ORACLE的同步, 而且OB的错误处理我相信要比ORACLE多,恐怕有一天会出现由于两种数据库系统在错误编号上可能存在相似之处,但错误消息却有所不同,这会增加诊断和故障排除的复杂性。
继续阅读
阅读原文