综合使用jdk工具查看线程情况

最近在测试环境做压测时,当压测运行一段时间后,发现应用无法处理任何相应。最后经过排查发现是因为logback的RollingFileAppender在并发情况下,会造成线程死锁等待(jstack中查看到大量waitting状态的线程)

解决方案,更改logback的appender

<appender name="RollingFile-Appender" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_PATH}/rollingfile.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>${LOG_ARCHIVE}/%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxHistory>90</maxHistory>
            <timeBasedFileNamingAndTriggeringPolicy
                    class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
                <!-- 最大64MB 超过最大值,会重新建一个文件-->
                <maxFileSize>64 MB</maxFileSize>
            </timeBasedFileNamingAndTriggeringPolicy>
        </rollingPolicy>
        <encoder>
            <pattern>
                [ %-5level] [%date{yyyy-MM-dd HH:mm:ss}] %logger{96} [%line] - %msg%n
            </pattern>
            <charset>UTF-8</charset> <!-- 此处设置字符集,防止中文乱码 -->
        </encoder>
    </appender>
    <appender name="Async-Appender" class="ch.qos.logback.classic.AsyncAppender">
        <appender-ref ref="RollingFile-Appender" />
    </appender>


    <root level="WARN">
        <appender-ref ref="Async-Appender" />
    </root>


附Java线程状态排查方案,下面的一些命令可能有些特殊,并不适用于本次出现的问题(本次出现的问题排查命令比较简单):

使用jps命令查看进程ID号

root@ubuntu:~/apps$ jps -mlvV | grep -v 'mlvV'
1401 gateway-0.0.1-SNAPSHOT.jar
1374 eureka-server-0.0.1-SNAPSHOT.jar


使用top命令查看内存与CPU情况(可以直接top查看整体情况,这里面使用top -Hp ${pid}查看的是单个java进程中所有的线程)
root@ubuntu:~/apps$ top -Hp 1401
top - 19:17:23 up 2 days, 13:56,  1 user,  load average: 0.03, 0.03, 0.00
Threads:  71 total,   0 running,  71 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  4028724 total,  1166204 free,  2214336 used,   648184 buff/cache
KiB Swap:  4192252 total,  4192252 free,        0 used.  1555352 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  1457 root      20   0 3584736 657100  17876 S  1.0 16.3  19:39.91 java
  1435 root      20   0 3584736 657100  17876 S  0.7 16.3  19:32.40 java
  1439 root      20   0 3584736 657100  17876 S  0.3 16.3   0:14.52 java
  1374 root      20   0 3584736 657100  17876 S  0.0 16.3   0:00.00 java
  1376 root      20   0 3584736 657100  17876 S  0.0 16.3   0:09.49 java


如发现有某个进程使用的内存或CPU超出预期,可单独查看该进程的虚拟机栈
因为jstack中,线程ID是16进制,而top命令看到的线程ID是10进制,可以使用如下命令,将10进制转化为16进制:

root@ubuntu:~/apps$ printf '%x\n' 1457
5b1


使用jstack,观察线程的虚拟机栈与运行情况。这里面采用grep -n '0x5b1',可以查看到指定线程号,在jstack输出中的行号(位置),然后再采用more +n${num}命令可以从指定的行开始查看内容

也可以直接使用jstack 1374,查看栈的整体情况

root@ubuntu:~/apps$ jstack 1374|grep -n '0x5b1'
321:"TaskAcceptor-localhost" #51 daemon prio=5 os_prio=0 tid=0x00007fae7d7f6800 nid=0x5b1 waiting on condition [0x00007fae2da86000]


从指定的行开始查看线程的虚拟机栈与运行状态(关键是more +n)
root@ubuntu:~/apps$ jstack 1374|more +n321
"TaskAcceptor-localhost" #51 daemon prio=5 os_prio=0 tid=0x00007fae7d7f6800 nid=0x5b1 waiting on condition [0x00007fae2da86000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000c537a0a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
	at com.netflix.eureka.util.batcher.AcceptorExecutor$AcceptorRunner.drainInputQueues(AcceptorExecutor.java:225)
	at com.netflix.eureka.util.batcher.AcceptorExecutor$AcceptorRunner.run(AcceptorExecutor.java:186)
	at java.lang.Thread.run(Thread.java:748)

猜你喜欢

转载自hanzhdy.iteye.com/blog/2414860