因系统时间调整导致的 ORA-01555 快照过旧问题

适用范围

Oracle 11g 及之后版本,undo自动调优关闭情况。

问题概述

在测试时批量作业(从一个4千万的表查找满足条件的数据插入到新表)发生中断,后台报错:ORA-01555 快照过旧。从alert日志可以找到发生报错的语句单独执行在2分钟左右可以完成。undo表空间大小配置了120GB,undo自动调优是关闭的,检查undo使用状态,发现120G的undo基本都是UNEXPIRED 状态,UNEXPIRED的undo数据没有随着时间变化状态转为EXPIRED,按照MOS文档 IF: Causes for High Undo Tablespace Space Usage (Doc ID 1951402.1)做了相关设置后,UNEXPIRED 状态的undo数据依旧保持在近120G大小,没有发现降低。

问题原因

发生ORA-01555的一般原因是undo表空间设置太小,事务量比较大、查询语句执行时间长执行过程中所需要的undo被覆盖了导致这个报错。但这次发生ORA-01555报错不是上述原因导致的:1)undo表空间有120GB;2)undo保留时间为1800秒,大于查询执行时间(2分钟);3)当前事务量只是最多插入4千万行数据,并不算大。
检查alert日志发现系统记录的时间有2023-08-03、2023-05-02、2022-11-02等各段时间可以得出端倪,该系统在测试过程中多次修改过系统时间,怀疑是修改系统时间导致,数据库在将来的一个时间(2023-08-03)测试发生了大量的事务,使用了大量的undo,测试完成后将系统时间往回调后(2022-11-02),因为undo里记录的事务时间大于当前时间,所以当前时间减去事务记录时间不可能大于undo_retention时间,所以undo会一直是UNEXPIRED状态,因为绝大部分的undo是UNEXPIRED状态,新的事务没有足够可用的undo 导致最近的undo被快速覆盖,查询找不到所需要的undo 构建查询块而报ORA-01555错误。

测试验证:

--会话一:
--关闭undo自动调优,undo_retention设置为5分钟
alter system set "_undo_autotune"=false;
alter system set undo_retention=300;
alter system set "_smu_debug_mode"= 33554432;
SQL>  alter database datafile 4 autoextend on maxsize 580M;

Database altered.
SQL> show parameter undo;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_undo_autotune                       boolean     FALSE
temp_undo_enabled                    boolean     FALSE
undo_management                      string      AUTO
undo_retention                       integer     300
undo_tablespace                      string      UNDOTBS1
SQL> show parameter _smu

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_smu_debug_mode                      big integer 32M

--会话二:进行插入操作,消耗undo

会话一:查看undo状态,UNEXPIRED状态为211M
SQL> r
  1*  select sysdate,status,sum(bytes)/1024/1024 size_mb, count(*),tablespace_name from dba_undo_extents group by status,tablespace_name

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2022-06-26 21:49:47 EXPIRED      77.5625         86 UNDOTBS1
2022-06-26 21:49:47 UNEXPIRED    211.875        225 UNDOTBS1
--关闭数据库,修改主机时间,将时间回调
--操作系统
[root@og1 ~]# date -s 06/18/14
Wed Jun 18 00:00:00 CST 2014
--sqlplus 操作 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1526722872 bytes
Fixed Size                  8896824 bytes
Variable Size             570425344 bytes
Database Buffers          939524096 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
--再次查看undo状态
SQL> select sysdate,status,sum(bytes)/1024/1024 size_mb, count(*),tablespace_name from dba_undo_extents group by status,tablespace_name;

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2014-06-18 00:00:30 ACTIVE             1          1 UNDOTBS1
2014-06-18 00:00:30 EXPIRED            1          1 UNDOTBS1
2014-06-18 00:00:30 UNEXPIRED   288.4375        310 UNDOTBS1
。。。


SQL> /

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2014-06-18 00:21:52 EXPIRED            4          4 UNDOTBS1
2014-06-18 00:21:52 UNEXPIRED   287.4375        309 UNDOTBS1
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1526722872 bytes
Fixed Size                  8896824 bytes
Variable Size             570425344 bytes
Database Buffers          939524096 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
SQL> select sysdate,status,sum(bytes)/1024/1024 size_mb, count(*),tablespace_name from dba_undo_extents group by status,tablespace_name;

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2014-06-18 00:22:58 ACTIVE             1          1 UNDOTBS1
2014-06-18 00:22:58 EXPIRED            4          4 UNDOTBS1
2014-06-18 00:22:58 UNEXPIRED   286.4375        308 UNDOTBS1
--20多分钟后(已经远大于undo保留时间5分钟)UNEXPIRED 的undo还是有287.4375M,
SQL> /

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2014-06-18 00:25:41 EXPIRED            4          4 UNDOTBS1
2014-06-18 00:25:41 UNEXPIRED   287.4375        309 UNDOTBS1
--再次修改系统时间将时间设置为大于原来事务创建时间
[root@og1 ~]# date 062916022022.00
Wed Jun 29 16:02:00 CST 2022
--sqlplus 操作 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup   
ORACLE instance started.

Total System Global Area 1526722872 bytes
Fixed Size                  8896824 bytes
Variable Size             570425344 bytes
Database Buffers          939524096 bytes
Redo Buffers                7876608 bytes
Database mounted.
Database opened.
--调整时间后,undo大部分立即都变成EXPIRED 状态了    
SQL> select sysdate,status,sum(bytes)/1024/1024 size_mb, count(*),tablespace_name from dba_undo_extents group by status,tablespace_name;

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2022-06-29 16:02:29 EXPIRED     281.4375        303 UNDOTBS1
2022-06-29 16:02:29 ACTIVE             1          1 UNDOTBS1
2022-06-29 16:02:29 UNEXPIRED          9          9 UNDOTBS1

SQL> select sysdate,status,sum(bytes)/1024/1024 size_mb, count(*),tablespace_name from dba_undo_extents group by status,tablespace_name;

SYSDATE             STATUS       SIZE_MB   COUNT(*) TABLESPACE_NAME
------------------- --------- ---------- ---------- ------------------------------
2022-06-29 16:02:34 EXPIRED     281.4375        303 UNDOTBS1
2022-06-29 16:02:34 ACTIVE             1          1 UNDOTBS1
2022-06-29 16:02:34 UNEXPIRED          9          9 UNDOTBS1

解决方案

由于测试过程中短期不会将时间再调回到将来比事务发生时更早的时间,所以采用新建undo表空间替换原来undo的办法解决。替换后时间没有发生调整,undo回收正常,任务执行正常。

参考文档

MOS文档 IF: Causes for High Undo Tablespace Space Usage (Doc ID 1951402.1)


数据运维技术 » 因系统时间调整导致的 ORA-01555 快照过旧问题