首页 » ORACLE 9i-23c » Troubleshooting log file switch (checkpoint incomplete) event

Troubleshooting log file switch (checkpoint incomplete) event

From the AWR we can see what issue is more frequent:
a) DBWR has not finished checkpointing (log file switch (checkpoint incomplete) )
b) ARCH has not finished copying the redo log file to the archive destination (Archival required)
c) Wait event in session is log file switch (checkpoint incomplete).

The difference between log file switch completion and log file switch completion (Checkpoint Incomplete) wait event is, in the case of former wait event the users wait for the Log writer background process (LGWr) to complete its work (log switch). In the case of latter wait event the users wait for the Database Writer background process (DBWr) to complete its work (checkpoint).

Also be aware of the v$log_history view for showing the frequency of log switches.Redo log switching should occur approximately every 20-30 minutes to get better database performance.You should start with a minimum of 100 MB redo log files for any production database.

The log file switch – checkpoint incomplete event tells you that Oracle wants to reuse a redo log file, but the current checkpoint position is still in that log file. This results in redo generation being stopped until the log switch is done. Obviously not good!

DIAG:

####check  db alert log

Tue Dec 17 13:18:56 2019
Archived Log entry 1882576 added for thread 1 sequence 1400670 ID 0x7bbf4e1f dest 1:
Tue Dec 17 13:20:19 2019
Beginning log switch checkpoint up to RBA [0x155f60.2.10], SCN: 16614105812146
<<<<<<<<< RBA [0x155f60.2.10] SQL> @hex 155f60

                                DEC                  HEX
----------------------------------- --------------------
                     1400672.000000               155F60

SQL> 
<<<<<<<<<
Thread 1 advanced to log sequence 1400672 (LGWR switch)
  Current log# 6 seq# 1400672 mem# 0: /dev/rjf_newredo06
Tue Dec 17 13:20:41 2019
Archived Log entry 1882577 added for thread 1 sequence 1400671 ID 0x7bbf4e1f dest 1:
Tue Dec 17 13:22:06 2019
Thread 1 cannot allocate new log, sequence 1400673    <<<<<<<<<<<<
Checkpoint not complete
  Current log# 6 seq# 1400672 mem# 0: /dev/rjf_newredo06
Tue Dec 17 13:22:13 2019
Completed checkpoint up to RBA [0x155f52.2.10], SCN: 16614092203284
<<<<< SQL> @hex 155f52

                                DEC                  HEX
----------------------------------- --------------------
                     1400658.000000               155F52
SQL> select 1400672-1400658 from dual;

1400672-1400658
---------------
             14
SQL> 
<<<<<


====ckpt trace

*** 2019-12-13 14:00:51.783
  1: 5476ms (rw) file: kcrf.c line: 10058 count: 20 total: 18533ms time: 1309739
  2: 3919ms (rw) file: krse.c line: 1706 count: 129 total: 130333ms time: 1810661
  3: 2853ms (rw) file: kcv.c line: 11894 count: 154 total: 114440ms time: 2363673
  4: 1991ms (rw) file: krc.c line: 1404 count: 3 total: 4261ms time: 1931600
  5: 1529ms (rw) file: krb.c line: 3380 count: 4 total: 3910ms time: 522836
  6: 1526ms (rw) file: krbb.c line: 14189 count: 5 total: 4119ms time: 1160529
  7: 1201ms (ro) file: kcrr.c line: 3598 count: 478 total: 299042ms time: 1222178
  8: 1197ms (rw) file: kct.c line: 7799 count: 4 total: 3172ms time: 1935398
  9: 1181ms (ro) file: krb.c line: 3449 count: 8 total: 6156ms time: 1492674
 10: 1128ms (rw) file: kct.c line: 1010 count: 1 total: 1128ms time: 1661388
Control file enqueue hold time tracking dump at time: 2958185

 SQL> select to_char(sampLE_time,'yyyy-mm-dd hh24:mi:ss')time,session_id,program,event
 from dba_hist_active_sess_history  where snap_id=94570 and instance_number=1 and program like '%CKPT%'
  order by sample_time;

SQL> select to_char(sample_time,'yyyy-mm-dd hh24:mi:ss')time,session_id,program,event,round(delta_write_io_bytes/1024/1024) from dba_hist_active_sess_history  where snap_id=94570 and instance_number=1 and program like '%DBW%'
  2  order by sample_time;

TIME                SESSION_ID PROGRAM                                                          EVENT                ROUND(DELTA_WRITE_IO_BYTES/1024/1024)
------------------- ---------- ---------------------------------------------------------------- -------------------- -------------------------------------
2019-12-17 13:05:48       1299 oracle@kdjf1 (DBW0)                                              db file parallel wri                                  2962
2019-12-17 13:05:48       2184 oracle@kdjf1 (DBWF)                                              db file parallel wri                                  3257
2019-12-17 13:05:48       2125 oracle@kdjf1 (DBWE)                                              db file parallel wri                                  4429
2019-12-17 13:05:48       2007 oracle@kdjf1 (DBWC)                                              db file parallel wri                                  2813
...

SQL> @p db_writer_coalesce

NAME                                     VALUE
---------------------------------------- ----------------------------------------
_db_writer_coalesce_write_limit          131072
_db_writer_coalesce_area_size            4194304

SQL>   with t as ( select SNAP_ID,
   SMALL_WRITE_MEGABYTES-lag(SMALL_WRITE_MEGABYTES)over(order by SNAP_ID) SMALL_WRITE_MEGABYTES,
  SMALL_WRITE_REQS-lag(SMALL_WRITE_REQS)over(order by SNAP_ID) SMALL_WRITE_REQS,
   LARGE_WRITE_MEGABYTES-lag(LARGE_WRITE_MEGABYTES)over(order by SNAP_ID) LARGE_WRITE_MEGABYTES, 
 LARGE_WRITE_REQS-lag(LARGE_WRITE_REQS)over(order by SNAP_ID) LARGE_WRITE_REQS 
 from DBA_HIST_IOSTAT_FUNCTION WHERE INSTANCE_NUMBER=1 AND FUNCTION_NAME='DBWR' order by SNAP_ID)
 select to_char(a.begin_interval_time,'yyyy-mm-dd hh24:mi') etime,b.SMALL_WRITE_MEGABYTES,b.SMALL_WRITE_REQS,b.LARGE_WRITE_MEGABYTES,b.LARGE_WRITE_REQS from dba_hist_snapshot a,t b where INSTANCE_NUMBER=1 and a.snap_id=b.snap_id
 order by a.snap_id
 
 
ETIME            SMALL_WRITE_MEGABYTES SMALL_WRITE_REQS LARGE_WRITE_MEGABYTES LARGE_WRITE_REQS
---------------- --------------------- ---------------- --------------------- ----------------
2019-12-16 15:00                145946          3707949                 79875           638881
2019-12-16 15:30                140521          3623324                 79446           635468
...

oracle 10g supports automatic checkpoint tuning which takes advantage of periods of low i/o usage to advance checkpoints and therefore improve availability. automatic checkpoint tuning is effect if the FAST_START_MTTR_TARGET db initialization parameter is set nonzero value.

AWR statistics
you can to compare stats of issue time and nomal time with AWRDD Report .
DBWR checkpoint buffers written
the Estimated MTTR(S) and Recovery estimated ios in the instance recover stats.

Checkpoint Activity
MTTR Writes means that write is written out due to checkpoint reason , and fast_start_mttr_setting is the setting that is driven checkpoints.

Autotune Ckpt Writes are trickle writes that DBWR do when it is relatively idle.

Checkpoint Activity

  • Total Physical Writes: 8,730,776
MTTR Writes Log Size Writes Log Ckpt Writes Other Settings Writes Autotune Ckpt Writes Thread Ckpt Writes
0 2,842 0 0 7,336,346 0

 

View

x$estimated_mttr, v$instance_reocvery   — estimated mttr related info and statistics

Debug

if you need a MTTR statisc, set the 10333 event, when set the event ,you can see the MTTR statistic in alert log.

alter system set events '10333 trace name context forever, level 1';

sometime if need to clear MTTR statistics, you set this event 10330 as a workaround which will clear mttr statistics.

alter system set events '10330 trace name context forever, level 1';
alter system set fast_start_mttr_target= xx  scope=memory;

errorstack of dbwr

oradebug setospid [spid of dbwr]
oradebug dump errorstack 3;

Recommendation:
To resolve these incomplete checkpoints you need to give the checkpoint process more time to cycle through the logs by doing one or both of the following:
* increase the size of the redo logs
* add more redo log groups
* reduce fast_start_mttr_target to lower value
* increase I/O performance
Additionally, consider turning off logging for the temporary tables with transient data using the the NOLOGGING attribute on the tables.

打赏

对不起,这篇文章暂时关闭评论。