线程在使用java.net.InetAddress.getLocalHost进行主机名查找时进入阻塞状态。

huangapple go评论85阅读模式
英文:

Threads getting into blocked state while doing a host name look up using java.net.InetAddress.getLocalHost

问题

以下是问题所见的应用程序/环境细节。

  • 在 Tomcat 9.0.35 上部署的 Java Web 应用程序,使用 JRE 版本 1.8.0_231-b11
  • 该应用程序在部署在 OpenShift Kubernetes 分发平台上的 Docker 容器中运行。

我注意到应用程序中有许多线程有时会进入 BLOCKED(阻塞)状态,持续几分钟。在线程转储分析中,发现 java.net.InetAddress.getLocalHost 调用花费了太多时间。许多线程在此处被阻塞。每次在应用程序中打印日志时,都会获取主机名。

问题是间歇性的。但当出现问题时,应用程序/Tomcat 会进入暂停状态,从而导致线程大量积累。经过一段时间(几秒钟),所有阻塞的线程会同时解除阻塞。由于请求并发性,应用程序将耗尽维护在池中的数据库连接,从而导致问题/减慢/服务可用性。作为解决方法,我已确保仅将主机名访问一次,并在整个日志记录过程中重复使用相同的主机名。我想知道此问题的详细根本原因。

  • 为什么会间歇性地出现这个问题?
  • 在这个 Kubernetes 环境中是否存在 DNS 查找问题?
  • 我们正在使用 IPV4 协议/地址
  • 是否有更好的方法/修复措施来处理这个问题?

以下是线程转储示例:

 "https-jsse-nio-8443-exec-13" #95 daemon prio=5 os_prio=0 tid=0x00007fccadbba800 nid=0xaf5 waiting for monitor entry 0x00007fcb912d1000
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at java.net.InetAddress.getLocalHost(InetAddress.java:1486)
    	- waiting to lock <0x00000005e71878a0> (a java.lang.Object)
英文:

Please find below the application/environment details where the problem is seen.

  • Java Web application deployed on Tomcat 9.0.35 with JRE Version 1.8.0_231-b11
  • The application is running in a docker container deployed on Open shift Kubernetes Distribution platform.

I see lot of threads in the application are getting into a BLOCKED state sometimes for few mins. On thread dump analysis, it was found that java.net.InetAddress.getLocalHost call is taking too much time. Lot of threads are getting stuck here. The host name is fetched for every logger printed in the application.

The issue is intermittent. But when it occurs, the application/tomcat will go into a paused state which leads to the accumulation of lot of threads. After some time(few seconds), all the blocked threads are unblocked simultaneously. Because of the request concurrency, the application will run out of DB connections which it maintains in the pool leading to issues/slowness/service availability. As a fix, I have made sure to access the host name only once into a static variable and use the same throughout the logging process. I wanted to know the detailed root cause of this issue.

  • Why this issue is occurring intermittently?
  • Is there a problem with DNS look up in this kubernetes environment?
  • We are using IPV4 protocol/addresses
  • Are there any better approaches/fixes to handle this issue?

Sample below from the thread dump:

 "https-jsse-nio-8443-exec-13" #95 daemon prio=5 os_prio=0 tid=0x00007fccadbba800 nid=0xaf5 waiting for monitor entry 0x00007fcb912d1000
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at java.net.InetAddress.getLocalHost(InetAddress.java:1486)
    	- waiting to lock <0x00000005e71878a0> (a java.lang.Object)

答案1

得分: 4

在JDK 8中,InetAddress.getLocalHost()的工作方式如下链接所示:

  1. 通过本机的原生 gethostname 调用获取主机名作为字符串。
  2. 如果自上次主机名解析以来不到5秒,返回缓存的IP地址。
  3. 否则解析主机名:
    • 使用JDK内置的查找缓存,其默认TTL为30秒;
    • 使用系统调用,执行实际的DNS查找(根据配置,操作系统和DNS服务器可能进一步缓存地址)。
  4. 将解析后的本地主机IP地址缓存5秒。

步骤2-4在全局 cacheLock 下执行。如果在此过程中出现问题,调用 InetAddress.getLocalHost() 的所有线程将会在此锁定处阻塞 - 这正是您观察到的情况。

通常情况下,只要主机地址在 /etc/hosts 中进行了硬编码,本地主机名解析就不会导致网络调用。但在您的情况下,似乎涉及了真正的网络请求(每当TTL过期时)。并且当第一个DNS请求超时时(UDP毕竟不是可靠的协议),会发生延迟。

解决方案是配置 /etc/hosts,包含本地主机的名称和地址,例如:

192.168.1.23   myhost.mydomain

其中 myhost.mydomainhostname 命令返回的字符串相同。

最后,如果预计在应用程序运行时主机名不会改变,将其在应用程序级别上缓存一次并永久固定下来似乎是一个不错的修复方法。

英文:

In JDK 8, InetAddress.getLocalHost() works as follows:

  1. Obtain host name as a string via native gethostname call.
  2. If there was less than 5 seconds since the last host name resolution, return the cached IP address.
  3. Otherwise resolve the host name:
    • using JDK built-in lookup cache, which has the default TTL equal to 30 seconds;
    • using the system call, which performs an actual DNS lookup (depending on the configuration, the address may be further cached by the OS and DNS servers).
  4. Cache the resolved local host IP address for 5 seconds.

Steps 2-4 are performed under the global cacheLock. If something goes wrong during this process, all threads calling InetAddress.getLocalHost() will block at this lock - exactly what you observe.

Usually local host name resolution does not end up in a network call, as long as the host address is hard-coded in /etc/hosts. But in your case it seems like the real network requests are involved (whenever TTL expires). And when the first DNS request times out (UDP is not a reliable protocol after all), a delay happens.

The solution is to configure /etc/hosts to contain the name and the address of the local host, e.g.

192.168.1.23   myhost.mydomain

where myhost.mydomain is the same string as returned by hostname command.

Finally, if the host name is not expected to change while the application is running, caching it once and forever on the application level looks like a good fix.

答案2

得分: 0

为了解决这个问题,我只在应用程序启动时加载一次主机名,并在加载时进行缓存。我已经将这个修复方案推广到生产环境,我们不再看到线程阻塞的问题。

英文:

To fix the issue, I am loading the hostname only once and caching it during the application start up. I have rolled out this fix to production and we are not seeing the thread blocking issues anymore.

答案3

得分: 0

也许服务器将使用IPv6进行查找,如果未在使用中,您可以配置JVM仅使用IPV4,要这样做,请将以下内容添加到选项中:-Djava.net.preferIPv4Stack=true;或者如果仅需要IPv6,使用-Djava.net.preferIPv6Stack=true。这将强制JVM使用正确的协议。

英文:

Maybe server is going to look using ipv6 and if is not in use you can configure JVM to use only IPV4, to do so add this to the options -Djava.net.preferIPv4Stack=true or if only need ipv6 -Djava.net.preferIPv6Stack=true. This will force JVM to use the right protocol.

huangapple
  • 本文由 发表于 2020年9月30日 19:24:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/64136575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定