参考文档:
Bug 8656212 - Alert log with message "Warning: recovery process XXXX cannot use async I/O" (Doc ID 8656212.8)
https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/V-MANAGED_STANDBY.html
RDBMS 11.2.0.4 的主备库。
发生网络或者存储方面的故障后,DG出现一些问题。
问题1
主库RAC,备库RAC,主备库不同步,主库上不再显示LNS服务进程相关的信息。也就是主库不再发送日志到备库。
解决方法:
重启备库RAC,重启后,主库的alert log中提示'LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2'
随后,继续传输到备库,备库media recover 后正常。这个比较奇怪,什么原因,不是很清楚。
Beginning log switch checkpoint up to RBA [0x21db2.2.10], SCN: 6255905417984
Thread 1 advanced to log sequence 138674 (LGWR switch)Current log# 1 seq# 138674 mem# 0: +DATA_ABC/prod/onlinelog/group_1.296.123456Current log# 1 seq# 138674 mem# 1: +RECO_DEF/prod/onlinelog/group_1.273.123456
Mon Nov 21 09:48:08 2022
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
LNS: Standby redo logfile selected for thread 1 sequence 138674 for destination LOG_ARCHIVE_DEST_2
Mon Nov 21 09:48:20 2022
Archived Log entry 500862 added for thread 1 sequence 138673 ID 0x14184331 dest 1:
Mon Nov 21 09:50:08 2022
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance PROD1 - Archival Error. Archiver continuing.
Mon Nov 21 09:51:04 2022
问题2
主库RAC,备库RAC ,主备库不同步,主库上有部分日志没有发送过去,逐个手工copy到备库注册后,可以media recover。但是发现主库上切换了日志后,几乎备库接收不到。
重启备库的MRP进程后,提示'ORA-27090: 无法为异步磁盘 I/O 保留内核资源 '等告警,如下:
Mon Nov 21 10:24:32 2022
Errors in file /u01/ABCDG/oracle/diag/rdbms/ABCDG/ab/trace/ab_pr0m_26593.trc:
ORA-27090: 无法为异步磁盘 I/O 保留内核资源
Additional information: 3
Additional information: 128
Additional information: 166549176
Warning: recovery process PR0M cannot use async I/O
Mon Nov 21 10:24:32 2022
Errors in file /u01/ABCDG/oracle/diag/rdbms/ABCDG/ab/trace/ab_pr0n_26595.trc:
ORA-27090: 无法为异步磁盘 I/O 保留内核资源
Additional information: 3
Additional information: 128
Additional information: 166549176
Warning: recovery process PR0N cannot use async I/O
Mon Nov 21 10:24:32 2022
继续查看对应的trc文件,如下:(可以看到KCBR: Number of read descriptors = 32)
******************* End of process map dump ************
----- Process Resource Limits -----
***************** Dumping Resource Limits(s/h) *****************
core file size 0 KB/UNLIMITED
data seg size UNLIMITED/UNLIMITED
file size UNLIMITED/UNLIMITED
pending signals 2010 KB/2010 KB
max locked memory 238 GB/238 GB
max memory size UNLIMITED/UNLIMITED
open files 64 KB/64 KB
POSIX message queues 800 KB/800 KB
stack size 32 MB/UNLIMITED
cpu time UNLIMITED/UNLIMITED
max user processes 16 KB/16 KB
virtual memory UNLIMITED/UNLIMITED
file locks UNLIMITED/UNLIMITED
***************** End of Resource Limits Dump ******************
----- Java Stack -----
----- VKTM Time Drifts Circular Buffer -----
Warning: recovery process PR0M cannot use async I/O
KCBR: Number of read descriptors = 32*** 2022-11-21 14:10:50.276
KCBR: Media recovery blocks read (SYNC) = 180651
KCBR: Influx buffers flushed = 10145 times
KCBR: Redo cache copies/changes = 1100946/1100943
查看归档日志情况,发现MRP0的状态是WAIT_FOR_LOG
SYS@ab>select process,status,thread#,sequence# from v$managed_standby order by 3,12 ;PROCESS STATUS THREAD# SEQUENCE#
------------------ ------------------------ ---------- ----------
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
ARCH CLOSING 1 429441
ARCH CLOSING 1 418157
ARCH CLOSING 1 429440
ARCH CLOSING 1 429437
RFS IDLE 1 429442PROCESS STATUS THREAD# SEQUENCE#
------------------ ------------------------ ---------- ----------
MRP0 WAIT_FOR_LOG 2 38156612 rows selected.
根据官方文档中对v$managed_standby的解释:
WAIT_FOR_LOG - Process is waiting for the archived redo log to be completed (进程等待归档redo完成)
所以,就发生了归档日志一直传不过来,或者归档日志一直不应用。
解决方法:
根据MOS文档Bug 8656212 - Alert log with message "Warning: recovery process XXXX cannot use async I/O" (Doc ID 8656212.8)
设置alter system set "_media_recovery_read_batch" = 64 sid ='*' scope = spfile;
以下为MOS中的描述,从MOS中的描述,可以看到在某些情况下同步和异步IO会转换
Description
Alert log can show the next message during recovery as the recovery processes
may not use ASYNC IO:
"Warning: recovery process XXXX cannot use async I/O"
Note that to confirm that async I/O is being used and in the case of
recovery in a standby database, look for the following trace line in
the MRP0 trace file after managed recovery has been started and stopped
(the trace is printed upon completion of a recovery session):
Example:
Read rate (ASYNC): 11070353Kb in 123.79s => 87.33 Mb/sec
If sync I/O then SYNC will be displayed instead of ASYNC.
Workaround
Set _media_recovery_read_batch to 64 or 100
设置完毕后,重启数据库,打开MRP进程,归档日志的传输和应用正常,查询v$managed_standby结果为APPLYING_LOG
问题3
主库RAC,备库RAC,主库的节点1可以传输归档日志到备库,节点2开始报以下错误。节点2无法传送日志到备库。同时在主库上ping备库,是没有问题的。
Error 12514 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'mno'. Error is 12514.
Mon Nov 21 10:20:53 2022
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
PING[ARC1]: Heartbeat failed to connect to standby 'mno'. Error is 16191.
Mon Nov 21 10:21:54 2022
Error 1017 received logging on to the standby
------------------------------------------------------------
重启MRP进程后,也报异步IO方面的错误。
Errors in file /u01/AAADG/db/diag/rdbms/mno/mno1/trace/mno1_pr0t_5157.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Additional information: 3
Additional information: 128
Additional information: 166549176
Warning: recovery process PR0T cannot use async I/O
Mon Nov 21 11:27:07 2022
Errors in file /u01/AAADG/db/diag/rdbms/mno/mno1/trace/mno1_pr0u_5159.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Additional information: 3
Additional information: 128
Additional information: 166549176
Warning: recovery process PR0U cannot use async I/O
通过set "_media_recovery_read_batch" = 64 后解决异步IO的问题,随后启动数据库和MRP进程。alert 中提示如下:
通过以下alert log中查看,貌似密码文件问题(比较奇怪的是,数据库运行了很久了,重启后出现这个问题)
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
Suppressing further error logging of LOG_ARCHIVE_DEST_2.
Mon Nov 21 14:31:49 2022
Thread 2 advanced to log sequence 1420 (LGWR switch)Current log# 4 seq# 1420 mem# 0: +DATA/prod/redo04.log
Mon Nov 21 14:31:49 2022
Archived Log entry 4772 added for thread 2 sequence 1419 ID 0x1da8a7e5 dest 1:
Mon Nov 21 14:31:50 2022
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
Suppressing further error logging of LOG_ARCHIVE_DEST_2.
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance prod2 - Archival Error. Archiver continuing.
^C
从arc3进程中看,貌似节点2登录备库拒绝
[oracle@AAAdb2 trace]$ ls -lrt *arc3*Initial buffer sizes: read 1024K, overflow 832K, change 805K
OCISessionBegin failed -1*** 2022-11-21 00:21:56.100
.. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied
'
OCISessionBegin failed. Error -1
.. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied
'*** 2022-11-21 00:21:57.136
OCISessionBegin failed. Error -1
.. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied
'
OCISessionBegin failed. Error -1
.. Detailed OCI error val is 1017 and errmsg is 'ORA-01017: invalid username/password; logon denied
'
*** 2022-11-21 00:21:57.158 4329 krsh.c
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password fi
查看节点1和节点2上的密码文件,发现节点2上只有一个密码文件,随后,将节点1上的密码文件copy到节点1上。
[oracle@AAAdb2 dbs]$ ls -lrt orapw*
-rw-r----- 1 oracle oinstall 1536 Mar 2 2022 orapwprod2 --- 随后修改的这个文件
[oracle@AAAdb2 dbs]$ [oracle@AAAdb1 dbs]$ ls -lrt orapw*
-rw-r----- 1 oracle oinstall 1536 Mar 2 2022 orapwprod1
-rw-r----- 1 oracle oinstall 1536 Mar 3 2022 orapwprod2
[oracle@AAAdb1 dbs]$
密码文件拷贝后,alert log中立刻显示可以发送日志了
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
Mon Nov 21 15:05:01 2022
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
Mon Nov 21 15:06:02 2022
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.returning error ORA-16191
------------------------------------------------------------
Mon Nov 21 15:07:05 2022
Thread 2 advanced to log sequence 1421 (LGWR switch)Current log# 3 seq# 1421 mem# 0: +DATA/prod/redo03.log
Mon Nov 21 15:07:05 2022
LNS: Standby redo logfile selected for thread 2 sequence 1421 for destination LOG_ARCHIVE_DEST_2
Mon Nov 21 15:07:05 2022
Archived Log entry 4774 added for thread 2 sequence 1420 ID 0x1da8a7e5 dest 1:
Mon Nov 21 15:07:05 2022
ARC3: Standby redo logfile selected for thread 2 sequence 1420 for destination LOG_ARCHIVE_DEST_2
总结:
前2个问题比较奇怪,通过业务了解,确实是底层的存储出了一些问题。
最后一个密码问题,也比较奇怪,密码不一致,运行了很久才暴露。
END
上一篇:描写“恶臭”的句子有哪些?
下一篇:九条命电影英文观后感