I've been digging garbage collection logs these days. We have a production server which suffers long pauses of garbage collection. According to garbage collection log file, there are three different cases that force the JVM to do a StopTheWorld collection:
1. Full GC after a promotion failure: This is the evidence of not enough contiguous space in the old (tenured) generation. The simplest solution to this is increasing heap size. (I am glad it worked for us.)
2. Full GC after a concurrent mode failure (in other words; full promotion guarantee failure) : CMS collector can not catch up with the object allocation speed of the application. If this is the case try setting CMSInitiatingOccupancyFraction to a lower value. CMS will start early but will finish its job on-time.
3. The third one was very difficult for me to figure out. Here is what it looks like: Everything seems to work fine but suddenly Full GC kicks in and 37 seconds of break for application threads.
216066.711: [GC 216066.713: [ParNew: 911554K->71428K(943744K), 0.0860545 secs] 2346321K->1508264K(4089472K), 0.0878449 secs] [Times: user=3.63 sys=0.32, real=0.09 secs] 216066.896: [Full GC 216066.897: [CMS: 1436836K->1003062K(3145728K), 37.8691646 secs] 1527124K->1003062K(4089472K), [CMS Perm : 86016K->31249K(86016K)], 37.8707796 secs] [Times: u ser=37.64 sys=0.35, real=37.87 secs] 216106.398: [GC 216106.399: [ParNew: 838912K->46337K(943744K), 0.0555625 secs] 1841974K->1049399K(4089472K), 0.0573404 secs] [Times: user=1.33 sys=0.45, real=0.06 secs]The above log says that in order to clean permanent generation a full collection occured and (86016 - 31249) kilobytes of space freed in permanent generation.
What I've learned today is : The garbage of the permanent generation is only collected by a Full GC. If you see unexpected full collections try expanding permanent generation.