Troubleshooting: Logfile sync Wait breakdown & causes

Log file sync event can appear in any busiest system or the environment that configured badly or might be due to some bugs. But log file sync is an accumulated time taken of other aspects which is not alone a foreground process that is waiting for commit post/poll acknowledgement from LGWR.

The log file sync wait event is the time the foreground processes spends waiting for the redo to be flushed. According to the documentation , The “log file sync” wait event may be broken down into the following components:

1. Wakeup LGWR if idle
2. LGWR gathers the redo to be written and issues the I/O
3. Wait time for the log write I/O to complete
4. LGWR I/O post processing ,  LGWR posting the foreground/user session that the write has completed
5. Foreground/user session wakeup

When you troubleshoot , Where to look at this metrics and how you determine which is causing high waits.

The log file sync wait can be broken into:

1 = In a busiest system or anytime this case is not possible, It can be assumed here that the LGWR is always active, it does not have to be awaken by the system dispatcher.

2= AWR Report , two statistics , namely, Redo Write time and User commits.

Redo Write time ,  this is the total elapsed time of the write from the redo log buffer to the current redo log file (in centiseconds).

User commits, number of commits per second, per transaction

3= Awr Report, Log file parallel write per second

Note:- Steps 2 and 3 are accumulated in the “redo write time” statistic. (i.e. as found under STATISICS section of Statspack) Step 3 is the “log file parallel write” wait event.

4= AWR report, calculate commit frequency. microstat –PL and prstat on lgwr process shows how much time it is spending on CPU delaying

5= Though LGWR post , The foreground process is finally waken up by the system dispatcher. This completes the ‘log file sync’ wait.

So you may need to understand , is log file sync can be high due to
1) Excessive commits
2) Disk I/O slowness
3) OS scheduling or CPU issues
To quickly understand this. Look at few stats from AWR report as below.
Case 1:-

Event                        Avg wait Time (ms)
——————————————
log file sync                       10

Event                          Avg wait Time (ms)
——————————————
log file parallel write              6.3

Statistic                             Total     per Second
——————————- ———– ————–
redo blocks written                 230,881        2,998.5
redo write time                       6,226           80.9
user commits                         38,135          495.3
user rollbacks                            1            0.0
user calls                          433,717        5,632.7

 

Note: Redo write time in centiseconds

Log file sync wait – per user commits (redo write time) – log file parallel write.

6226/38135 = 0.16ms per user “redo write time”  out 10 log file sync

Therefore, 0.16ms per user per redo write wait out of is not ideally a bad number of waiting
but 500 per commits per second is on higher side, this represents that lgwr is experiencing excessive commits. Consider application reduce the commits using batch transactions.

Case 2:-

Event                        Avg wait Time (ms)
——————————————
log file sync                       10

Event                          Avg wait Time (ms)
——————————————
log file parallel write              6.3

Statistic                             Total     per Second
——————————- ———– ————–
redo blocks written                 230,881        2,998.5
redo write time                 38,135          495.3
user commits                         6,226           80.9
user rollbacks                            1            0.0
user calls                          433,717        5,632.7

38135/6225 = 6.1ms per “redo write time”

Therefore, 6.1ms per second wait of I/O write (log file parallel write) consuming and causing log file sync event, you may need to  check your extended device statistics using
sar, iostat etc. Consider moving redo to faster disks etc.
Case 3:-

Event                        Avg wait Time (ms)
——————————————
log file sync                       101

Event                          Avg wait Time (ms)
——————————————
log file parallel write              37

Statistic          Total     per Second     per Trans
——————————————————
redo write time  275,944           77.6           0.2
user commits   1,163,048          327.1           1.0

776/327 = 2.3ms per user commits of “redo write time”

Note: Redo write time in centiseconds

Log file sync wait – per user commits (redo write time) – log file parallel write.

101-2.3-37=61.7ms

Therefore, 61.7 ms out of 101 ms was going to post processing (can be lgwr kick off by cpu load or due to other contentions or latch waitings in the database)

Consider scheduling of the lgwr using nice, load of the database, other contentions causing cpu loads etc.

The above written in order to clear my understanding and represent the log file sync event in conjunction with AWR or sys stats to pin point the issue. This case may not be accurate in terms of stats but provides a overview of troubleshooting or break down the log file sync wait.

References:-

http://www.pythian.com/news/1098/tuning-log-file-sync-event-waits/

NOTE:1376916.1 – Troubleshooting: log file sync’ Waits

http://christianbilien.wordpress.com/2008/02/12/the-%E2%80%9Clog-file-sync%E2%80%9D-wait-event-is-not-always-spent-waiting-for-an-io/

http://www.confio.com/db-resources/oracle-wait-events/log-file-sync/

-Thanks

Suresh
Rs. 150 .Com at GoDaddy.com!

1 comment to Troubleshooting: Logfile sync Wait breakdown & causes

Leave a Reply