线程在套接字写入时挂起:为什么?如何防止?

huangapple go评论139阅读模式
英文:

Thread hanging on socket write: why ? and how to prevent it?

问题

我在为我的游戏运行服务器。
我使用了一个相当简单的系统,基本上为每个连接的传入数据都有一个专用线程进行读取。
写入完全与读取无关,是在一个线程池中进行的,不断轮询写入任务队列。

一切都运行正常,甚至在周末的晚上,同时可能有500多名玩家连接,也没有出现任何主要问题。大部分时间内似乎并没有过多的CPU负载或网络带宽负荷。

除了一个主要情况,即一个单独的玩家有一个虚假连接可能会阻塞整个系统。
确实经常出现一个线程在socketOutputStream.write上长时间阻塞,可能会持续几个小时,除非我尝试中断它或强制关闭相应的套接字。

因此,从这里开始,有两个问题:

问题1:在什么情况下,套接字上的写操作可能会阻塞?

对于读操作来说,如果没有新数据可读,阻塞是显而易见的,但我不太理解为什么写入会阻塞,除非我尝试发送的数据量超过整个网络连接的处理能力。
在后一种情况下,我显然需要等待一段时间并阻塞... 但情况远非如此。我有200 Mbps的带宽限制,即使在高峰时段,我也很难达到1或2 Mbps。
交换的数据非常少:在游戏中没有传输声音、图像或视频;只有简单的文本命令。

问题2:在Socket类中,有一个名为setSOTimeout的方法(请参阅文档:https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/net/Socket.html),它为读操作设置了最大超时时间。
如果套接字在超时时间内没有读取任何内容,将会抛出特定的异常,让我有机会检查连接是否仍然处于活动状态并能够响应。我已经在基于此机制踢出不响应的玩家。
然而,这只适用于读操作。

是否有类似于写操作的机制?我找不到任何相关信息。

一些额外的说明:

  • 服务器在Linux机器上运行,使用的是Java AdoptOpenJDK 11.0.6。
  • 玩家可以通过SSL连接,也可以不使用(ServerSocket和SSLServerSocket在不同端口上监听)。其中大部分玩家(60%)使用SSL。我使用了默认的SSL配置和Let's Encrypt证书。使用SSL似乎并不比非SSL连接慢。
  • 看起来在Java 8中我没有遇到这个问题。我自从从8升级到11后就一直有这个问题。

谢谢你的回答。

英文:

I'm running a server for my game.
I'm using a quite simple system where basicly, there's a dedicated thread for reading incoming data for each of the connections.
Writing is totally independant from reading and is done in a pool of threads constantly polling a que of write tasks.

Everything is working fine, even in the evening of week-ends where there are easily more than 500 connected players in the same time.
This means more than 500 threads, what doesn't seem to cause any major problem most of the time. No excess CPU or network bandwidth load.

Except in one major case where a single player having a bogus connection might block the whole system.
It indeed happens regularely that a thread stays blocked on socketOutputStream.write for a long time, maybe several hours, if I don't try to interrupt it or forcefully close the corresponding socket.

Hance, from there, two questions:

Question 1: In which case a write operation on a socket might block ?

If it's obvious for read operations that it blocks so long as there's no new data to read, I don't understand very well why a write can block, unless I'm trying to send more data than my whole network connection can handle in total.
In this later case, it's obvious that I have to wait and block for a while... but it's far from being the case. I have 200 Mbps bandwidth limit and I'm hardly reaching 1 or 2 Mbps at rush hours.
Exchanged data is very minimal: no sound, no images, no video are going through when playing the game; only simple text commands.

Question 2: In the Socket class, there's a method setSOTimeout (see doc at https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/net/Socket.html), which sets a maximum timeout for read operations.
If the socket doesn't read anything within the timeout, a specific exception is launched, giving me the opportunity to check if the connection is still active and responding. I'm already using this mechanism to kick out non-responding players based on this.
However, it only applies on read operations.

Is there something similar for write operations ? I can't find anything.

Some additional precisions:

  • The server is running on a linux machine, with Java AdoptOpenJDK 11.0.6
  • Players can connect via SSL or not (a ServerSocket and a SSLServerSocket are listening on different ports). Most of them (60%) do use SSL. I'm using the default SSL configuration with a letsencrypt certificate. Using SSL doesn't appear to be slower than non-SSL connections.
  • It seems that I hadn't this problem with Java 8. I have it since I upgraded from 8 to 11.

Thank you for your answers.

答案1

得分: 3

写入套接字如果您继续发送数据而对方拒绝读取数据,则可能会阻塞。

操作系统为传出数据维护一个缓冲区,以防数据丢失,必须重新发送。当对方确认接收数据("确认")时,数据从缓冲区中清除。如果对方停止确认数据,但您继续发送,最终缓冲区将填满,进一步的写入尝试将被阻塞,以便数据不丢失。

据我所知,写入套接字没有超时设置。也许您可以向应用程序协议添加一个 "您还在吗" 的消息,如果您没有收到回复,则关闭连接。另一个选择是从阻塞套接字IO转移到非阻塞,但这是一个重大的工作。

英文:

Writing to a socket may block if you keep sending data but the other side refuses to read it.

The operating system maintains a buffer for outgoing data, in case data needs gets lost and has to be sent again. Data is cleared from the buffer when the other party confirms receiving it ("acknowledge"). If the other party stops acknowledging data but you keep sending, eventually the buffer fills up, and further attempts to write will block so that data isn't lost.

There's no timeout setting for writing to sockets that I'm aware of. Maybe you could add an "are you still there" message to your application protocol, and close the connection if you don't get a reply. Another option is moving from blocking socket IO to non-blocking, but that's a major undertaking.

huangapple
  • 本文由 发表于 2020年4月8日 04:54:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/61089264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定