Don't start profiling your Java application too early April 11, 2003 Q: In ” My Kingdom for a Good Timer! ,” what is the purpose of this part of the sample code: // JIT/hotspot warmup: for (int i = 0; i < 3000; ++ i) { timer.start (); timer.stop (); timer.getDuration (); timer.reset (); } A: The short answer is the above code snippet “warms up” the JVM with respect to the ITimer implementation. Let’s investigate what I mean by that in more detail. It’s easier to illustrate my point with a specially crafted code snippet: public class Main { public static void main (String [] args) { // Create an ITimer using the Factory class: final ITimer timer = TimerFactory.newTimer (); for (int repeat = 0; repeat < 25; ++ repeat) { timer.start (); sum (100); timer.stop (); System.out.println (repeat + ": " + timer.getDuration ()); timer.reset (); } } public static int sum (int n) { if (n <= 1) return 1; else return n + sum (n - 1); } } // End of class As you can see, I use the ITimer high-resolution API once again to measure the performance of a test method called sum(), which is a very naïve implementation of the sum of numbers 1+2+...+n (an arithmetic progression). It is implemented recursively by design. You should never use such an implementation inside a real application. However, it suits my discussion precisely because of its high method call overhead. The above code snippet invokes sum(100) 25 times and logs the cost of every call. The results from running in Java 2 Platform, Standard Edition (J2SE) 1.4.1 for Win32 are quite interesting: 0: 0.08888 1: 0.04755 2: 0.03448 3: 0.03462 4: 0.03432 5: 0.03437 6: 0.03440 7: 0.03460 8: 0.03441 9: 0.03445 10: 0.0344 11: 0.0343 12: 0.0343 13: 0.0345 14: 0.8865 15: 0.0444 16: 0.0107 17: 0.0103 18: 0.0102 19: 0.0102 20: 0.0102 21: 0.0103 22: 0.0101 23: 0.0103 24: 0.0103 Notice that something strange happens around the 15th repeat: the method slows down for a while and then speeds up by a factor of 3.5. “It’s just a random garbage collection (GC) delay,” I hear some seasoned Java programmers mutter. Except it is not: a sporadic GC run would not explain why sum() gets faster afterwards and stays that way (you can see that by making the upper limit for the repeat loop variable much larger than 25). Furthermore, you can repeat this experiment numerous times, and the phenomenon will always happen at about the same point in the program’s execution. What is going on? The riddle has a simple explanation. A call to sum(n) implies n total calls to the method; n-1 of them are recursive. After 15 calls to sum(100), the method body executes a grand total of 1,500 times. As it happens, my code executes in the client HotSpot JVM by default, and 1,500 is the default method invocation threshold for when a method is considered for compilation into native code. Execution slows down while HotSpot spends extra CPU cycles to compile sum() into native code and then speeds up because the JVM switches to the method’s natively compiled, and thus noticeably faster, version. For further proof you can add the -XX:+PrintCompilation HotSpot option (see Resources) to your JVM command line and watch the HotSpot engine tell you when it compiles various methods: ... other methods from core libraries get compiled ... 0: 0.08118 1: 0.03642 2: 0.03460 3: 0.03468 4: 0.03470 5: 0.03439 6: 0.03440 7: 0.03463 8: 0.03441 9: 0.03460 10: 0.0348 11: 0.0345 12: 0.0342 9 b java.lang.String::indexOf (74 bytes) 13: 0.0350 10 b Main::sum (16 bytes) 14: 1.4324 15: 0.0469 16: 0.0103 17: 0.0102 18: 0.0101 19: 0.0101 20: 0.0100 21: 0.0104 22: 0.0102 23: 0.0104 24: 0.0104 At this point, I can at last give a deeper reason behind the code snippet in the question that started this article: in a loop of length 3,000 (greater than the default method invocation threshold for a client HotSpot VM), I exercise all ITimer methods to have them compiled into native code and/or inlined. Although I cheat HotSpot a little, this ultimately gives me a small boost in the timing resolution and improves profiling accuracy when measuring short time intervals. If you remove the call to sum() from the above sample code and change the loop repeat count to, say, 2,000 so that timer.start() and timer.stop() execute back to back multiple times, you will observe the expected phenomenon around the 1,500th line in the output: ... 1492: 0.00283 1493: 0.00271 1494: 0.00271 1495: 0.00267 1496: 0.00271 1497: 0.00272 1498: 0.00284 1499: 2.84138 1500: 0.00724 1501: 0.00207 1502: 0.00219 1503: 0.00196 1504: 0.00203 1505: 0.00203 1506: 0.00203 1507: 0.00203 ... The improvement in timer resolution is about 30 percent, not mind-bogglingly large, but easy to achieve. Therefore, I make it a rule to always condition my ITimer implementation this way when I use it for very fine measurements. Lessons learned Together we have seen that a given piece of Java code will execute at very different speeds during an application’s lifetime. Initially, it might execute in interpreted mode; if it executes enough times, it becomes a candidate for compilation to native code. For a Java program with high uptime, such as an application server, most methods will translate into native code sooner or later. At this state of the JVM, profiling gives a realistic picture of the application’s performance. Profiling data taken early in the application’s lifetime should either be discarded or not collected at all (unless you are specifically profiling your application’s startup behavior). Unfortunately, I suspect that most programmers do not take this into proper consideration. Sun Microsystems’ server HotSpot JVM has a larger default method invocation threshold than the client JVM (10,000 instead of 1,500 (see Resources)), and it might take minutes, if not hours, to properly warm up the JVM. As an experiment, you might consider shortening the warm-up period by manipulating the -XX:CompileThreshold JVM option, although you will soon discover that making this threshold too small will delay your application’s startup, as HotSpot begins compiling just about every method it discovers into native code, including methods in the core Java libraries. Vladimir Roubtsov has programmed in a variety of languages for more than 13 years, including Java since 1995. Currently, he develops enterprise software as a senior developer for Trilogy in Austin, Texas. Build AutomationApp TestingJavaSecurity