Oceanbase ODC开发平台提供了可视化开发、WEBSQL、SQL窗口等功能,可以做认为是一个开发客户端,近期有个客户在此平台执行SQL时,出现了一次间断性报错,无人干预稍后再试恢复了正常,后面记录一个该案例的诊断过程,报错日志如下:ErrorCode = 0, SQLState = 08000, Details = unexpected end of stream, read 0 bytes from 4
因为ODC到OB之前是过了OBproxy,所以中间obproxy或observer出问题都有可能,在OB数据库中的应用连接结构:

对于OB 应用连接中断的诊断流程,OB官方给出了比较清晰的诊断树:应用断连问题排查

1,查看应用报错日志
执行sql报错时间点2026-01-09 10:06:02.098
连接使用的id号conn=1409877
报错代码ErrorCode = 0, SQLState = 08000, Details = unexpected end of stream, read 0 bytes from 4
Note: 匹配诊断树中unexpected end of stream, read XXX bytes from xxx,说明应用侧连接被断开
2,明确访问链路
应用使用的负载均衡VIP 指向 3个OBproxy, 使用 ssh 命令,登录对应的 OBProxy 节点, 执行 ps 命令查看 obproxy 进程启动时间,判断异常期间应用是否重启.
确实发现其中一个obproxy启动时间与问题时间匹配。
3, 分析obproxy日志
查看obproxy的日志, 在其安装目录的 /log 目录下, 以应用报错的conn=1409877的conn_id为关键字。
$ cd /home/admin/obproxy/log
$ grep "1409877" obproxy.log.20260109102553
[2026-01-09 10:06:00.899684] INFO [PROXY.SM] ob_mysql_sm.cpp:10223 [2274087][Y0-00007F8E94114FA0] [lt=15] [dc=0] Slow Query: ((client_ip={172.16.10.42:55780}, server_ip={172.16.10.42:2881}, obproxy_client_port={172.16.10.40:61666}, server_trace_id=, route_type=ROUTE_TYPE_MAX, user_name=xxx, tenant_name=yyyy, cluster_name=xxxx, logic_database_name=, logic_tenant_name=, ob_proxy_protocol=2, cs_id=1409877, proxy_sessid=13882480593272505850, ss_id=34587, server_sessid=3222107522
如果有异常相关的 SQL 文本,也可以以SQL为关键字,查找obproxy.log和obproxy_error.log。
日志解读:
Y0-00007F8E94114FA0:obproxy内部trace_id
client_ip={172.16.10.42:55780} : 通过 Proxy Protocol v2 传递的原始客户端地址
server_ip={172.16.10.42:2881}:obproxy将会话转发到的observer ip
obproxy_client_port={172.16.10.40:61666}:直接建立客户端 TCP 连接到当前 OBProxy 的对端地址
cs_id=1409877,obproxy记录的连接号,和前面应用报错的连接号conn=1409877一致
server_sessid=3222107522是cs_id=1409877会话在observer上的会话号
如何知道当前连接创建在哪个obproxy上?
a,确认客户端IP 如x.x.x.x
b, 查询obproxy
select client_ip -- 有的版本是host obproxy地址
from GV$OB_PROCESSLIST
where user_client_ip=‘x.x.x.x’; -- 程序来源地址
4,查询obproxy日志报错
从#3查到的连接信息前后看日志中有报错,也就是2026-01-09 10:06:00.899684到2026-01-09 10:06:02.098的日志,发现obproxy在2026-01-09 10:06:01.982076重启。重启前有大量报错 g_proxy_fatal_errcode=-4080.
[2026-01-09 10:06:01.924366] INFO [PROXY.NET] ob_unix_net.cpp:309 [2274135][Y0-00007F8EB404FD20] [lt=23] [dc=0] wait for net_global_connections_currently_open_stat down to zero(global_connections=161, thread_local_client_connections=1, g_proxy_fatal_errcode=-4080)
[2026-01-09 10:06:01.924885] WARN [PROXY] schedule_report_prometheus_info (ob_thread_prometheus.cpp:82) [2274159][Y0-00007F8EB3E55890] [lt=34] [dc=0] proxy need exit now
...
...
[2026-01-09 10:06:01.982738] INFO [PROXY] ob_proxy_main.cpp:478 [2444579][Y0-0000000000000000] [lt=8] [dc=0] ObProxy-OceanBase 4.2.1.0-20231220145047.el7-20231220145047-3d317219350d8396afd3532e7df6131225e7dd44
[2026-01-09 10:06:01.982753] INFO [PROXY] ob_proxy_main.cpp:483 [2444579][Y0-0000000000000000] [lt=11] [dc=0] has no inherited sockets, start new obproxy(info={is_inherited:false, upgrade_version:-1, need_conn_accept:true, user_rejected:0, ipv4_fd:-1, ipv6_fd:-1, received_sig:-1, sub_pid:-1, graceful_exit_end_time:0, graceful_exit_start_time:0, active_client_vc_count:-1, local_addr:, rc_status:"", hu_cmd:"", state:"HU_STATE_WAIT_HU_CMD", hu_status:"", is_parent:true, sub_status:"", last_parent_status:"", last_sub_status:"", upgrade_version_buf:"", argc:5, argv[0]="/home/admin/obproxy/bin/obproxy", argv[1]="-p", argv[2]="2883", argv[3]="-n", argv[4]="huajin_obproxy", inherited_argv[0]="/home/admin/obproxy/bin/obproxy", inherited_argv[1]="(null)", inherited_argv[2]="(null)", inherited_argv[3]="(null)"})
[2026-01-09 10:06:01.987127] INFO [PROXY] ob_proxy_config_utils.cpp:736 [2444579][Y0-0000000000000000] [lt=178] [dc=0] succ to read file(dir="/home/admin/obproxy/etc", file_name="./obproxy_config.bin", read_len=6960, len=16384)
5, 检查observer日志
根据#3中的server ip和server session查看问题时间observer是不是有问题
$ cd /home/admin/oceanbase/log
$ grep "3222107522" observer.log.20260109101019820
[2026-01-09 10:06:00.909105] WDIAG [SQL.SESSION] get_cursor (ob_sql_session_info.cpp:1386) [2768888][T1090_L0_G0][T1090][YB420A0B3C8E-00064138DC3E72D3-0-0] [lt=47][errcode=-4016] get cursor info failed(cursor_id=1, get_sessid()=3222107522)
[2026-01-09 10:06:00.938486] WDIAG [SQL.SESSION] get_cursor (ob_sql_session_info.cpp:1386) [2768901][T1090_L0_G0][T1090][YB420A0B3C8E-00064138BF1F7284-0-0] [lt=21][errcode=-4016] get cursor info failed(cursor_id=2, get_sessid()=3222107522)
[2026-01-09 10:06:02.073814] WDIAG [RPC.OBMYSQL] handle_sock_event (ob_sql_nio.cpp:861) [144746][sql_nio22][T0][Y0-0000000000000000-0-0] [lt=25][errcode=0] socket closed, it maybe disconnected by the client or by observer actively(mask=8197, *s={this:0x7f94e8bfa030, session_id:3222107522, trace_id:YB420A0B3C8E-00064138BC7FDFE9-0-0, sql_handling_stage:11, sql_initiative_shutdown:false, fd:2737, err:5, last_decode_time:1767924361207026, last_write_time:1767924360938544, pending_write_task:{buf:null, sz:0}, need_epoll_trigger_write:false, consume_size:4266, pending_flag:1, may_handling_flag:true, handler_close_flag:false})
[2026-01-09 10:06:02.073866] INFO [RPC.OBMYSQL] on_disconnect (obsm_conn_callback.cpp:268) [144746][sql_nio22][T0][Y0-0000000000000000-0-0] [lt=37] kill and revert session(conn.sessid_=3222107522, proxy_sessid=13882480593272505850, server_id=3, ret=0)
只有-4016错误,session 断开的现象。 observer并未重启和致命错误。
6, obproxy 重启原因 -4080
根据错误日志关键字”g_proxy_fatal_errcode=-4080” 能找到ODP 内存超限告警 memory is out of limit’s提到的错误。为达到了内存限制。解决方法调大参数 proxy_mem_limited 的值。
可白屏或黑屏,黑屏方式如下:
obclient -h127.1 -P2888 -u root@proxysys -Doceanbase -A -p
obclient> show proxyconfig like 'proxy_mem_limited';
obclient> alter proxyconfig set proxy_mem_limited = '3G';
小结:
客户端连接中断报错,根据OB官方的诊断树可以找到日志中的报错,因为过obproxy,所以分client session和server session,中间无论 是obproxy还是observer重启都可能导致连接中断。本案例分析发现仅为obproxy重启,重启原因为4080错误,怀疑当时的内存达到上限,后期可考虑增加。