A few weeks ago I started getting OutOfMemoryError exceptions on a Linux production server. The free and top tools showed that there was still 1GB of free system memory. Also, after turning on verbose garbage collection statistics in the Java virtual machine, I saw that of the 700MB of heap, only 300MB was really used.
It took me days to track this down since I had no clue where to look at all. Finally it turned out that each Java thread reserves memory for the execution stack. On this system, the user's stack limit defaulted to 8MB (RHEL 2.1 AS). What I didn't know was that even though the thread didn't use the whole 8MB, the kernel still reserved the memory for each thread to be able to ensure that the memory could be used. However, since the memory was only reserved and not used, none of the statistics displayed that it wasn't available for anything else. Since Resin was configured with maximum 700 simulation threads, this could potentially result in the operating system reserving 5,5GM of memory (on a machine with only 3GB). So as soon as about 256 threads were created, shabaaang, any new thread would throw an OutOfMemoryError exception.
The solution was very simple, I simple added ulimit -s 1024 to the user's .bash_profile file. Now, the 700 threads only reserve 700MB and there's still plenty of room for other things.
Such a tiny setting makes a fundamental difference!