首页 » ORACLE, 系统相关 » Troubleshooting High Private/Interconnect Network Latency Case

Troubleshooting High Private/Interconnect Network Latency Case

系统资源的最大利用是系统优化的目标,但是过度”优化”系统的容错性就相对较差,如果出现丝毫的卡顿就会“雪崩”,前段时间分析的一个案例,号称是亚洲最好的一套HP小机配置,有Oracle原厂操刀调试的一套Oracle 12c R1环境,不过确实效率很高,突然有一天因RAC 节点之间的private network 高latency导致频繁的 gc 相关等待,从而导致性能问题。

问题前半小时node2  AWR

Per Second Per Transaction Per Exec Per Call
DB Time(s): 380.7 0.2 0.00 0.00
DB CPU(s): 68.2 0.0 0.00 0.00
Background CPU(s): 8.6 0.0 0.00 0.00
Redo size (bytes): 39,391,148.6 16,534.7
Logical read (blocks): 2,776,413.2 1,165.4
Block changes: 178,201.7 74.8
Physical read (blocks): 29,338.7 12.3
Physical write (blocks): 6,488.4 2.7
Read IO requests: 18,763.1 7.9
Write IO requests: 2,981.2 1.3
Read IO (MB): 229.2 0.1
Write IO (MB): 50.7 0.0
IM scan rows: 0.0 0.0
Session Logical Read IM:
Global Cache blocks received: 2,144.3 0.9
Global Cache blocks served: 3,382.7 1.4
User calls: 112,248.9 47.1
Parses (SQL): 4,135.5 1.7
Hard parses (SQL): 17.6 0.0
SQL Work Area (MB): 36.1 0.0
Logons: 15.4 0.0
Executes (SQL): 93,517.4 39.3
Rollbacks: 2.0 0.0
Transactions: 2,382.3

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %: 99.92 Redo NoWait %: 100.00
Buffer Hit %: 98.98 In-memory Sort %: 100.00
Library Hit %: 100.26 Soft Parse %: 99.57
Execute to Parse %: 95.58 Latch Hit %: 99.77
Parse CPU to Parse Elapsd %: 54.18 % Non-Parse CPU: 99.66
Flash Cache Hit %: 0.00

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
DB CPU 60.1K 17.9
db file sequential read 10,859,769 13.7K 1.27 4.1 User I/O
log file sync 2,211,105 13.4K 6.06 4.0 Commit
latch: cache buffers chains 379,825 3994.9 10.52 1.2 Concurrency
gc cr grant 2-way 4,700,313 3993.6 0.85 1.2 Cluster
enq: TX – row lock contention 732 1991.3 2720.37 .6 Application
latch free 378,211 1389.8 3.67 .4 Other
db file parallel read 534,612 1239.2 2.32 .4 User I/O
gc buffer busy acquire 308,207 1225.6 3.98 .4 Cluster
gc current grant 2-way 959,084 851.4 0.89 .3 Cluster

15分钟后node2  AWR

Load Profile

Per Second Per Transaction Per Exec Per Call
DB Time(s): 2,366.8 1.8 0.05 0.04
DB CPU(s): 48.4 0.0 0.00 0.00
Background CPU(s): 7.3 0.0 0.00 0.00
Redo size (bytes): 17,262,623.5 12,978.6
Logical read (blocks): 1,821,514.9 1,369.5
Block changes: 80,043.7 60.2
Physical read (blocks): 23,230.6 17.5
Physical write (blocks): 4,214.2 3.2
Read IO requests: 15,238.3 11.5
Write IO requests: 2,345.5 1.8
Read IO (MB): 181.5 0.1
Write IO (MB): 32.9 0.0
IM scan rows: 0.0 0.0
Session Logical Read IM:
Global Cache blocks received: 6,799.8 5.1
Global Cache blocks served: 7,779.9 5.9
User calls: 62,306.9 46.8
Parses (SQL): 2,077.9 1.6
Hard parses (SQL): 22.2 0.0
SQL Work Area (MB): 19.1 0.0
Logons: 12.6 0.0
Executes (SQL): 43,196.6 32.5
Rollbacks: 3.1 0.0
Transactions: 1,330.1

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %: 99.66 Redo NoWait %: 100.00
Buffer Hit %: 98.72 In-memory Sort %: 100.00
Library Hit %: 100.14 Soft Parse %: 98.93
Execute to Parse %: 95.19 Latch Hit %: 105.08
Parse CPU to Parse Elapsd %: 70.98 % Non-Parse CPU: 99.62
Flash Cache Hit %: 0.00

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
latch: row cache objects 5,655,292 477K 84.35 21.7 Concurrency
enq: TX – index contention 305,813 376.4K 1230.67 17.1 Concurrency
gc cr grant 2-way 4,342,796 211K 48.59 9.6 Cluster
buffer busy waits 723,245 202.2K 279.60 9.2 Concurrency
gc buffer busy release 364,833 198.8K 544.96 9.0 Cluster
gc buffer busy acquire 2,666,894 137K 51.36 6.2 Cluster
gc cr multi block request 625,425 55K 87.91 2.5 Cluster
DB CPU 45K 2.0
gc current grant 2-way 812,849 40.1K 49.34 1.8 Cluster
gc current grant busy 459,003 28.1K 61.16 1.3 Cluster

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
256 128 16 0.40 0.33 28.0 5.0 0.6 67.1

Instance CPU

%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager)
23.4 71.0 0.0

看问题时刻另一节点node1的AWR TOP EVENT

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
gc buffer busy acquire 1,940,939 106.1K 54.67 16.7 Cluster
gc cr block 2-way 4,050,009 92.3K 22.78 14.5 Cluster
gc cr grant 2-way 3,445,569 70.5K 20.46 11.1 Cluster
enq: TX – index contention 48,310 59K 1222.08 9.3 Concurrency
DB CPU 38.4K 6.0
gc current block 2-way 1,636,339 35K 21.40 5.5 Cluster
gc buffer busy release 58,210 32.9K 565.78 5.2 Cluster
gc cr multi block request 822,890 32.6K 39.62 5.1 Cluster
gc cr grant congested 215,405 27.8K 129.05 4.4 Cluster
latch: row cache objects 6,116,171 23.5K 3.84 3.7 Concurrency

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
256 128 16 0.26 0.25 21.3 4.2 0.1 74.6

Instance CPU

%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager)
18.8 73.8 0.0

当时互相ping私网已经显示出不正常的network latency,正常应该是小于1ms. 两个节点的CPU看使用率并不高,当然需要看osw更精确的时间点。

分析方法

1, 检查两节点系统资源利用率,如是否因为CPU使用率过导致lms进程慢

2,AWRDD对比

3,检查网络延迟和瓶颈

$ netstat -s
$ ss -s
$ sar -n DEV 1 5
$ sar -n TCP 1 5
# ip link show – check network interface status (up /down)
# ethtool eth0 – ethtool is used for querying settings of an ethernet device like network speed, packet drop info etc

other tools iftop/nload  speedometer  nethogs tcptrack -i eth0

4,  ASH和OSW分钟或秒级分析查找问题最初的表现

5,  应用跨实例部署

等等

看RAC 部分AWR

NODE1

RAC Statistics

Begin End
Number of Instances: 2 2

Global Cache Load Profile

Per Second Per Transaction
Global Cache blocks received: 8,087.67 11.81
Global Cache blocks served: 6,770.28 9.89
GCS/GES messages received: 53,558.17 78.23
GCS/GES messages sent: 62,283.68 90.98
DBWR Fusion writes: 272.70 0.40
Estd Interconnect traffic (KB) 141,488.97

Global Cache and Enqueue Services – Workload Characteristics

Avg global enqueue get time (ms): 11.5
Avg global cache cr block receive time (ms): 23.8
Avg global cache current block receive time (ms): 25.7
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.0
Global cache log flushes for cr blocks served %: 20.4
Avg global cache cr block flush time (ms): 2.3
Avg global cache current block pin time (ms): 1.2
Avg global cache current block send time (ms): 0.0
Global cache log flushes for current blocks served %: 2.2
Avg global cache current block flush time (ms): 6.8

Global Cache and Enqueue Services – Messaging Statistics

Avg message sent queue time (ms): 0.2
Avg message sent queue time on ksxp (ms): 36.0
Avg message received queue time (ms): 0.0
Avg GCS message process time (ms): 0.0
Avg GES message process time (ms): 0.0
% of direct sent messages: 35.89
% of indirect sent messages: 63.86
% of flow controlled messages: 0.25

Interconnect Ping Latency Stats

Target Instance 500B Ping Count Avg Latency 500B msg Stddev 500B msg 8K Ping Count Avg Latency 8K msg Stddev 8K msg
1 71 0.20 0.06 71 0.20 0.06
2 71 4.60 7.01 71 4.71 7.03

Node2

RAC Statistics

Begin End
Number of Instances: 2 2

Global Cache Load Profile

Per Second Per Transaction
Global Cache blocks received: 6,799.76 5.11
Global Cache blocks served: 7,779.91 5.85
GCS/GES messages received: 57,138.25 42.96
GCS/GES messages sent: 51,467.70 38.70
DBWR Fusion writes: 563.76 0.42
Estd Interconnect traffic (KB) 137,849.40

Global Cache and Enqueue Services – Workload Characteristics

Avg global enqueue get time (ms): 53.1
Avg global cache cr block receive time (ms): 4.3
Avg global cache current block receive time (ms): 7.2
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.0
Global cache log flushes for cr blocks served %: 11.7
Avg global cache cr block flush time (ms): 5.0
Avg global cache current block pin time (ms): 3.8
Avg global cache current block send time (ms): 0.0
Global cache log flushes for current blocks served %: 2.1
Avg global cache current block flush time (ms): 14.7

Global Cache and Enqueue Services – Messaging Statistics

Avg message sent queue time (ms): 8.3
Avg message sent queue time on ksxp (ms): 9.8
Avg message received queue time (ms): 0.2
Avg GCS message process time (ms): 0.1
Avg GES message process time (ms): 0.0
% of direct sent messages: 44.90
% of indirect sent messages: 54.87
% of flow controlled messages: 0.24

这个问题和几年前Troubleshooting high “gc current grant 2-way” and “gc cr grant 2-way” 案例(二) 有些相似之处,Estd Interconnect traffic (KB) 我们可以看到几乎达到了千兆的上限141,488KB/s。

ksxp 层决定 IPC 协议如何执行的可以查询x$ksxpclient,dba_hist_ic_client_stats 视图显示了相同的信息,并且该视图确实收集了所有实例的统计信息。

select
name,
trunc(bytes_sent/1024/1024) as mb_sent,
trunc(bytes_rcv/1024/1024) as mb_recv,
trunc((bytes_sent+bytes_rcv)/1024/1024) as mb_total,
to_char(round(100*(((bytes_sent+bytes_rcv)/1024/1024)/
sum((bytes_sent+bytes_rcv)/1024/1024) over ()),2),'990.00') as total_perc
from
sys.x$ksxpclient
order by
total_perc desc;

Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows

‘netstat -s’ or ‘netstat -su’ and look for either “udpInOverflowsudpInOverflows”, “packet receive errors”, “fragments dropped” or “outgoing packet drop” depending on the platform.
If there is a significant increases in “outgoing packets dropped” in the TCP section of netstat -s output on the nodes remote to where the workload is running, increase the wmem_default and wmem_max both to 4MB (Linux) could resolve the issue.

Poor interconnect performance and high cpu utilization. `netstat -s` reports packet reassembly failures

Large UDP datagrams may be fragmented and sent in multiple frames based on Maximum Transmission Unit (MTU) size.
`netstat -s` reports a large number of Internet Protocol (IP) “reassembles failed” and “fragments dropped after timeout” in the “IP Statistics” section of the output on the receiving node.

`netstat -s` IP stat counters:
3104582 fragments dropped after timeout
34550600 reassemblies required
8961342 packets reassembled ok
3104582 packet reassembles failed.

Action: Increase fragment reassembly buffers, allocating more space for reassembly. Increase the time to reassemble packet fragment., increase udp receive buffers to accommodate network processing latencies that aggravate reassembly failures and identify CPU utilization that negatively impacts network stack processing. Note, increase the following settings will also increase the memory usage.

On LINUX:
To modify reassembly buffer space, change the following thresholds:

/proc/sys/net/ipv4/ipfrag_low_thresh (default = 196608)
/proc/sys/net/ipv4/ipfrag_high_thresh (default = 262144)

To modify packet fragment reassembly times, modify:
/proc/sys/net/ipv4/ipfrag_time (default = 30)

ADDM report

Finding 2: Interconnect Latency
Impact is 421.58 active sessions, 61.59% of total activity.
———————————————————–
Higher than expected latency of the cluster interconnect was responsible for significant database time on this instance.
The instance was consuming 414762 kilo bits per second of interconnect bandwidth.
65% of this interconnect bandwidth was used for global cache messaging, 0% for parallel query messaging and 33% for database lock management.
The average latency for 8K interconnect messages was 4708 microseconds. The instance is using the private interconnect device “lan901:801” with IP address 169.254.141.243 and source “33”.

Finding 5: Global Cache Congestion
Impact is 80.49 active sessions, 11.76% of total activity.
———————————————————-
Global Cache Service Processes (LMSn) in other instances were not processing requests fast enough.

Recommendation 1: Database Configuration Estimated benefit is 80.49 active sessions, 11.76% of total activity.
———————————————————————
Action
Increase throughput of the Global Cache Service (LMSn) processes.
Increase the number of Global Cache Service processes by increasing the value of the parameter “gcs_server_processes”. Alternatively, if the
host is CPU bound consider increasing the OS priority of the Global Cache Service processes.
Rationale
The value of parameter “gcs_server_processes” was “20” during theanalysis period.

简单记录一下问题现象, 这个case后来看是因为应用因为连接数满自动failover到了其它实例,跨实例的定时调度大量insert,加剧GC,当然期中还引起了latch  undo争用和看到index root split, 本篇仅记录一次interconnect network网络带宽耗尽后的gc现象。

打赏

目前这篇文章还没有评论(Rss)

我要评论